From our previous two blog posts, we've been gradually building towards this moment. Now, we'll bring everything together and demonstrate how to format your basic bounding box output with advanced tracking using ByteTrack and enhance the formatting for the box using OpenCV functions. If you haven't yet read the first two blogs in this series, Getting Started with Object Detection in YOLO and Training YOLO with a labelled Roboflow dataset, I strongly recommend doing so for a better understanding
Unlike my previous two posts, readers with at least an intermediate knowledge of data structures in Python (like list, dictionary, and tuple) and object-oriented programming principles, might benefit slightly more from this as the code snippets below utilize terminology associated with functions and classes. This task is code-intensive, so please follow the comments next to each line of code to understand what each line does. I've also provided explanations and the purpose of each code snippet as part of this blog. With that disclaimer out of the way, let's dive straight in!
In the previous blog post of this series, I demonstrated how to utilize a task-specific labeled Roboflow dataset to build a model tailored to our specific use case. Our final output revealed the trained model's success in accurately identifying players, goalkeepers, referees, and sports balls. The picture below shows our final output where we last left off.
The aim of this blog is to format those bounding boxes into an ellipse-styled player highlighter, each with a player tracker number on top. Referees will be highlighted similarly, but in yellow, and the ball will be indicated with a triangle-shaped marker. The final output should resemble the picture shown below.
We start by importing the required packages:
As previously mentioned, this task is somewhat code-intensive. To organize and manage the code more effectively, we will split it into a set of classes and functions. We have a total of 10 functions to read the video, save the video, detect frames, predict values in batches of frames, enable tracker functions, perform annotations on these frames, and handle a couple of additional sub-tasks. Out of these 10 functions, 6 (including the constructor function) belong to a class called Tracker, which performs all the predicting, tracking, and annotating tasks. For my beginner readers who would like to learn and understand more about functions, classes, and constructors in Python, please follow these links.
At the top of all these functions is a main function, which is the first function called by the compiler in any program. It acts as an orchestrator, calling other functions and initializing classes in the order needed to perform our tasks. The main function performs the following enlisted tasks in order, ensuring the proper flow of our code logic:
Reads the video from the specified path and return the frames as a list.
Creates an instance of the Tracker class specifying our pre-trained model specified by the path.
Calls the get_object_tracks method of the Tracker instance to obtain object tracks for the video frames.
Calls the draw_annotations method of the Tracker instance to draw the object tracks (players, referees, ball) on the video frames.
Saves the annotated frames converted into a video file at the specified path.
The main() Function
Reading Input Video as Frames
The function read_video reads all the frames from the input video file located at video path and returns them as a list. It uses OpenCV's VideoCapture to handle the video file and reads frames in a loop until it reaches the end of the video. Each successfully read frame is stored in the frames list, which is then returned by the function.
Some Additional Supporting Functions
These are some additional functions which will be called by the other modules of the Tracker class to perform required sub-tasks.
get_center_of_bbox calculates and returns the center coordinates of a bounding box.
get_bbox_width calculates and returns the width of a bounding box.
2. Tracker Class- Tracking and Annotation Functions
In the code snippet below, we start defining the Tracker class, which will contain modules for tracking and bounding box annotation in frames. The constructor init initializes both the YOLO model and the ByteTrack model from the supervision package (used for tracking). These initializations are performed within the constructor using the self keyword, making them instance variables that can be accessed by all methods within the class. From now on, any function within the class that requires access to the YOLO model or the tracker should refer to them using the self keyword within the class context.
Next, we will define a series of functions within the class to perform the primary tasks it's designed for in the first place : tracking and annotation.
Detecting Frames and Predicting for Frame Batches
The detect_frames function is designed to process a list of frames in batches of 20 for object detection/prediction task using the passed model. It aggregates all detected objects across the batches and returns them as a list (detections). This function will be called from the get_object_tracks function to detect objects in frames and return back those detections.
2a) Getting Object Tracks
Ensure to create a 'stubs' folder in the working directory to store these track files efficiently
The function get_object_tracks is pivotal and complex, performing tasks ranging from object detection using the detect_frames function to tracking players, balls, and referees. Due to its computational intensity, it requires a significant amount of time to execute. To mitigate this, we have implemented stubs and utilized Python's pickle package to load and store tracks after the initial successful run. This strategy allows subsequent executions to reuse precomputed tracks, thereby saving substantial computation time and memory.
The function begins by checking if tracking data can be read from a saved file specified by the user. If not, it proceeds to detect objects such as players, referees, and the ball within video frames using the detect_frames function. A tracks dictionary is initialized to accumulate the tracking information across frames for players, referees, and the ball. Detected objects undergo conversion and processing to facilitate tracking. For instance, a class detected as "goalkeeper" is converted to the "player" class for clarity and consistency.
The tracker updates with these processed detections, associating each detection with its corresponding frame and object type. Tracked bounding boxes and IDs are then stored in the tracks dictionary under "players" and "referees". Similarly, for the ball, its bounding box is updated accordingly.
Finally, once all detections are completed, the function ensures to save these updated detections in the stubs folder within our directory for future use, thereby avoiding redundant computational efforts in subsequent runs. In the provided stub path, the function saves the tracks dictionary to a pickle file. Ultimately, get_object_tracks returns the tracks dictionary containing comprehensive tracking data for players, referees, and the ball across all analyzed frames.
Now that we have both detections and ByteTrack-based tracking for our video, it's time to annotate these tracks and begin the formatting process using OpenCV.
Function to Draw Ellipse Annotation for Player and Referee Bounding Boxes
This function performs three-fold annotation task using OpenCV's formatting functions. It first converts bounding boxes into circles using cv2.ellipse(). Next, it generates a rectangle using cv2.rectangle() where the tracking ID is displayed, and finally, it utilizes cv2.putText() to write the tracking ID within the rectangle. Each section of the function is carefully commented to clarify the formatting choices for all three annotations.
Function to Draw Triangle for Ball's Bounding Box
This function employs cv2.drawContours() to transform the bounding box of a ball into a small green triangle displayed above it. The triangle's center aligns with the center of the bounding box, and the size of the triangle can be adjusted by positioning its left and right vertices accordingly.
2b) Calling above Ellipse and Triangle Annotation Functions to Annotate Frames
This function serves as the primary annotation function called from within the main(). It iterates through each frame and its associated tracking data, calling draw_ellipse and draw_triangle functions to stylize the bounding boxes of players, referees, and the ball. Subsequently, it appends these annotated frames into a list of frames and returns them back as final output.
3. Saving the Resulting Output
Finally, once the generated tracks have been carefully annotated and returned as output frames, it's time to display the output frames. The function save_video takes output_frames as a list of frames and a file path, and writes the frames to a video file at the specified path using the XVID codec at a frame rate of 24 frames per second. The width and height of the video are determined from the first frame in the list. After writing all frames, the video file is closed and saved.
Voila! Running the following steps, we have our desired output from a frame shown in image below.
Limitations and Next Steps
Despite yielding fairly accurate results, our model has significant room for improvement, particularly in detecting referees. Instances occur where referees are misclassified as players in frames. This issue may stem from our Roboflow dataset, which comprises only 612 training images, thus limiting its diversity. Additionally, our tracking system generates a new tracking ID whenever a player enters or exits the frame. Therefore, using a stable camera angle that covers the entire pitch length would likely result in more consistent and accurate tracking.
As we conclude our three-part blog series on Object Detection in Football, I hope you have found this series beneficial. However, our journey towards refining this model is far from over. Enhancements could involve incorporating features such as player and team assignment based on jersey color to calculate team ball possession, or implementing ball interpolation techniques to address instances of limited ball detection. The possibilities for improvement are indeed vast.
Thank you for following along with these blogs. If you have accompanied me this far, I appreciate your time and would love to hear your feedback. Please feel free to suggest future topics or express your views in the comments. Your feedback is invaluable to me.
Until then, I hope you enjoy watching the Euros, and I'll be rooting for Cristiano and Team Portugal.
Signing Off,
Yash
Would you like more posts on this series?
YES
NO
Comments