top of page
Writer's pictureYash Sakhuja

Getting Started with Object Detection using YOLO

Updated: Jun 15, 2024

Object Detection with YOLOv8x
Object Detection with YOLOv8x

The UEFA Champions League final weekend wouldn't be complete without me sharing a football related sample project. In this brief blog, I'll demonstrate how you can utilise the YOLO model from Ultralytics to construct a basic object detection model in Python. For this example, I have used a 30 second video clip from the FA Cup Quarter Final Match between Manchester United and Liverpool at Old Trafford.


We shall accomplish all of this in all but 5 lines of Python code, which is essentially loading the pre-trained YOLO model and displaying the results. In this blog, I'll illustrate how to develop a straightforward detection output similar to the image shown above, while also introducing essential object detection terminology. Additionally, I'll discuss methods for enhancing this model's accuracy (by training on use case specific labelled datasets) and visual appeal, which will be explored further in the subsequent blogs of this object-detection series.


Understanding Key Terminology


Here are some key terms utilised in object detection tasks, which I'll be using as we progress with this project.


Bounding Box: A rectangular frame that encloses an object within an image. There are several bounding box formats. You can read more about those here. In this project, I prefer the x,y,x,y format for it's intuitive nature. The (x_min, y_min, x_max, y_max), where (x_min, y_min) denote the coordinates of the top-left corner and (x_max, y_max) represent the coordinates of the bottom-right corner.


(x,y,x,y) Bounding Box
(x,y,x,y) Bounding Box



Object Class: The category or type of an object present in the image, such as 'person' or 'sports ball'.


Confidence Interval: This refers to the level of certainty associated with the predicted bounding boxes and class labels generated by the model


Tracking: Tracking in object detection involves monitoring how objects move in videos or pictures over time. While object detection typically identifies objects in each individual frame of a video, tracking allows the system to remember the positions of objects across frames, enabling it to follow their movements from one frame to the next.



Loading and Running a Simple YOLO Model


To kick off the coding process, we import YOLO from the Ultralytics package. Next, we load the YOLOv8x model. Here are a list of YOLOv8 model variants and their performance descriptions:


YOLOv8 Model Variants Description
YOLOv8 Model Variants Description

Then, we apply the model's predict function to the video file stored in the input_videos folder of the directory, setting save=True to store the output file in the runs folder that's automatically created, housing our generated output. Bingo! YOLO handles everything under the hood by running our video frames through the neural network.


## Importing packages
from ultralytics import YOLO

## Loading the model
model=YOLO('yolov8x')


## Running the YOLO model
results = model.predict('input_videos/FA_Cup_2024.mp4',save=True)

## Show the results
print(results[0])

## Optional- Show the results of a bounding box
print("################################")
for box in results[0].boxes:
    print(box)

Interpreting Results


Once all the video frames have been processed, detecting the elements in every frame, we have printed the results, which show us all the available names that the model is trained to detect, including our required object classes: person and sports ball.


Results 1
Results 1

When displaying the results of individual bounding boxes, as shown below, each bounding box contains information regarding the identified object class, the confidence level of the prediction, and the coordinates (x, y) of the bounding box. In this model, tracking is disabled, meaning it doesn't maintain object detection across frames. We will review tracking in more detail in our subsequent blogs of this series.


Results 2
Results 2: Bounding Box

That's some accurate output from just 5 lines of code, isn't it?




Enhancing Model and Results.


Our current model is basic but meets our needs. However, we can fine-tune it for better accuracy in distinguishing between players and referees, and excluding everything on the sidelines, by traning the best model on a labeled dataset available on Roboflow. We can also enhance the output by formatting the bounding boxes and enabling tracking, making them more visually appealing with OpenCV functions.


Here's a quick sample of what our final output we will aim for:



I hope this brief blog will serve as a quick getting-started guide, helping us to build and serve this model. Over the next couple of weeks, we will aim to develop and enhance it even further.


London 2024
London 2024

Until then Hala Madrid y nada más!!


Signing Off

Yash


19 views0 comments

Comments


bottom of page