The prediction accuracy of models for a specific task greatly depends on the training data. Models trained on general image classification datasets usually yield general results. However, when trained on specific datasets labeled for a particular use case and domain, they produce more precise outcomes. In this blog post, we will use an enhanced version of our YOLO model, trained on a labeled football player detection dataset from Roboflow. The dataset we will use is open-source and contains 663 images labeled with player, goalkeeper, ball, and referee.
If you have not read my previous blog post on Getting Started with Object Detection using YOLO, I highly recommend doing so before diving into this one.
Continuing from where we left-off in the previous blog-post, we had a general YOLO model running on a 30 second video clip from FA Cup 2024 Semi Final match between Manchester United vs Liverpool. The output from the model is displayed below:
Output from the general run of the YOLO model
Google Colaboratory Requirements
Before we proceed with training the model, there are two important points to highlight. First, I will be using Google Colaboratory instead of my own system due to the high computational requirements and the lack of a GPU on my machine. Google Colaboratory offers a few hours of free GPU runtime each day, which will be beneficial for our purposes.
Finally, before we start the coding process, you need to change the runtime type from CPU to T4 GPU.
Model Training
We begin by installing the required Ultralytics (for YOLO model) and Roboflow packages ( for trained dataset).
## Installing Packages
!pip install ultralytics
!pip install roboflow
Once the installation is complete, we proceed with obtaining the dataset from Roboflow. Be sure to sign up for Roboflow first; the signup process is relatively straightforward. After signing up, you can access the link to the dataset.
Now, Paste this code snippet into Google Colab Cell.
# Download code for YOLOv5 pasted from Roboflow
from roboflow import Roboflow
rf = Roboflow(api_key="YOUR API KEY")
project = rf.workspace("roboflow-jvuqo").project("football-players-detection-3zvbc")
version = project.version(1)
dataset = version.download("yolov5")
Running this code will download the dataset and create a folder named 'football-players-detection-1'. This folder will contain train, test, and valid sub-folders, each with images and their labels, along with a useful YAML file containing all the other relevant information.
The images are labeled with a class identifier, such as '2' representing the class 'player', as specified by the class names label from the YAML file. Additionally, the bounding box positions are defined using the x, y, w, h style labeling format. More detailed explanations of the bounding box formats are provided here.
Now that we have our dataset, we just need to configure a few more settings before proceeding with training the model. Ultralytics requires separating the train, test, and valid sub-folders from other files within the main folder, as per the model setup. We will achieve this using the shutil package, which allows us to perform file operations such as creating new folders and moving existing files and folders around.
# Ultralytics Folder naming Convention Requirement
import shutil
shutil.move('football-players-detection-1/train','football-players-detection-1/football-players-detection-1/train')
shutil.move('football-players-detection-1/test','football-players-detection-1/football-players-detection-1/test')
shutil.move('football-players-detection-1/valid','football-players-detection-1/football-players-detection-1/valid')
This additional line of code establishes a guideline for managing memory on the GPU. It directs the system to utilize "expandable segments" to allocate memory, potentially enhancing performance for memory-intensive tasks. This adjustment could prove beneficial in accelerating computations on GPUs.
%env PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
In simpler terms, think of it as a memory management trick that makes our classification tasks run faster.
Finally, with just one more line of code we could kickstart our model training.
Before you click "run" on this, have some other tasks on hand, as it might take a while to finish. But once it's done, we'll have the best model weights, and we won't need to repeat this process.
# Training the model - !! Takes time
!yolo task=detect mode=train model=yolov5l.pt data {dataset.location}/data.yaml epochs=100 imgsz=640
We will train our detection YOLOv5l model for 100 epochs and for specified image size for frames as 640
Epochs: In machine learning, epochs refer to the number of times the entire dataset is passed forward and backward through the neural network during the training process. Each epoch consists of one forward pass (where data is input into the network and predictions are made) and one backward pass (where the network learns from the errors and adjusts its parameters). Increasing the number of epochs allows the model to learn more from the data but may also increase training time.
Downloading and Predicting using our Best Model
Firstly, upon completion of the iterations, you could go ahead and download the best/last model
Navigate to the "runs" folder in the files section, where you'll find a new folder named "detect." Inside "detect," go to the "train" folder and then into the "weights" sub-folder. From there, download the "best.pt" model.
Once you've downloaded the model, place it in a new folder named "model" within your code directory, and then proceed to rerun the code from the previous blog post, utilizing the trained model weights.
The code below is the same as the one shown in the previous blogpost to train YOLO models with only changed the model name pointing to the path of the best model weights stored in the models folder within the directory.
## Importing packages
from ultralytics import YOLO
## Loading the trained best model saved in the model folder within the same folder as the python code file
model=YOLO('models/best.pt')
## Running the YOLO model and Saving to Output Videos
results = model.predict('input_videos/FA_Cup_2024.mp4',save=True)
## Show the results
print(results[0])
## Show the results of a bounding box
print("#################################################")
for box in results[0].boxes:
print(box)
Once you run this code, it may take a while, consider taking a break for a walk, a stretch, or enjoy a cup of hot chocolate/coffee – whatever you desire, as you've earned it. Upon completion, you'll have the final output video you've been working towards.
Results and Next Steps
Once the model completes its run, it stores the results in the "runs" folder, specifically in the most recent "predict" sub-folder within it. Let's see what the output looks like.
The model is now more tailored for football, accurately distinguishing between players, goalkeepers, and referees, as demonstrated in the image above. At this stage, the model's limitations may stem from the relatively small training dataset comprising only 612 images, potentially resulting in inaccurate detections. Additionally, the presence of moving camera angles could pose challenges in accurately identifying objects.
In the upcoming blogs of this series on building a football player detection system in Python, we'll aim to enhance the visual appeal of our output. We'll achieve this by implementing player tracking using ByteTrack and refining the appearance of bounding boxes for a cleaner look using OpenCV.
If you have any feedback about this blog post or any of my previous work, please don't hesitate to get in touch. I appreciate constructive feedback and am always striving to improve.
Signing Off,
Yash
Did you find this helpful?
YES
NO
Comments