Find your machine learning mojo!

How to create powerful object detection algorithms

One of the most compelling examples of machine learning in action is object detection. 

If you are in machine learning nerd like me, you have probably seen computer vision demonstration videos online.

In these videos, people look into a webcam then change their expression and the algorithm labels them. For example, the program can identify if they are happy/sad/angry/excited, etc.

When you first start machine learning, you think that computer vision is going to be your initial project.

You are distraught when you realize in the first couple of months of learning all you’re going to be doing is simple regression.

Welfare not it’s time to let you dive into object detection finally.!

Even if you’re not in too nerdy YouTube videos about machine learning, you will be familiar with the idea of object detection.

Unless you have been hiding under a rock for the last five years, you will have heard of the concept of self-driving cars.

To be able to go on the road, self-driving cars need to use object detection Algorithms to see where they’re going.

As you can imagine, there are lots of different important factors for self-driving cars and other AI that will be acting among us to understand.

Some of these considerations we are going to talk through today.

We are also going to touch on how you can implement simple object detection algorithms using python.

Before we get into all of this though, let’s talk a little bit more about this exciting area of computer vision.

How you can use object detection?

Okay, so we’ve already talked about the area of object detection people are most familiar with, self-driving cars.

Computer Vision technology is also applied in more sinister applications.

If you read the book 1989, you will be familiar with the idea of a state that is always watching you.

With object detection, this state becomes an even closer possibility.

One of the main benefits of object detection is that it can identify items or people quickly from an image.

For this reason, AI systems using this form of computer vision to survey crowds and identify people of interest are becoming more mainstream.

Don’t worry about it too much, at this point in time, it is unlikely to impact you.

However, the implementation of object detection algorithms for surveillance does pose several ethical concerns.

3 points on the ethics of computer vision

There are multiple challenges when it comes to data ethics, the implementation of Computer vision algorithms.

Some of the reasons why these present challenges others are below.


One big problem with the economic integration of computer vision is the accuracy of the algorithms. When you are using an algorithm to make a prediction or classify an object, in particular in a commercial situation, you must have a high accuracy across all potential labels. This is not always the case.

Lack of visibility

Another ethical concern with the implementation of a lack of clarity. The lack of general understanding of the processes involved. It is tough for people to challenge the output of an algorithm if they are not aware of how their data is used.


There are multiple examples of bias in the implementation of object detection algorithms. Most widely understood are the higher inaccuracies of results when identifying people of non-white skin tones.

One cause of this difference is the high proportion of white male developers of commercial object detection software.

This has meant that the data sets are overly bias towards people with light skin tones and the algorithms then showing higher accuracy on those skin tones.

This presents a problem when it comes to identifying people correctly with darker skin tones. Then when implemented in real-world situations the algorithms are shown to be racially biased, struggling to correctly identify people.

object detection masks

These ethical challenges are just one of the reason why people of all backgrounds must get involved in the development of object detection algorithms.

Now you have your call to action, it’s time to understand a bit more about the theory of object detection.

A beginners introduction into object detection theory

If you have not already read my previous tutorial on convolutions neural networks, I recommend that you go back and study that now before going into this tutorial.

Here is a link to the CNN tutorial.

Convolution neural networks are used in all aspects of Computervision. They allow the computer to break down an image and understand what it is using math.

The different convolution on the layers map to various aspects of an image. When you are training a model, you then train your train that is the computer to recognize the different pixel patterns and apply mathematics to classify them.

When you are doing object detection, you will also use a convolutional neural network. However, this neural network is slightly different from the ones that you have seen previously. 

The below graphic shows you how a U-NET CNN works in theory.

U-Net CNN process

You will note that from the diagram below you first complete all the layers of a standard neural network, then you go back up to recreate the image.

This process allows you to develop a regional segmentation map of the image.

This U-Net CNN is one example of an object detection optimised neural network.

2 Conventional Labelling Techniques used in object detection

The way that the image is labeled allows you to map the different parts of the image to different objects. There are two ways that you can label and image, bounding boxes and masks.

When you are using a bounding box, you basically draw a box around the object that you were trying to label (like a boundary) and then apply a label to that box.

Labelled bounding boxes

A mask is slightly different. The mask is essentially a color-coded version of the image where each colour is mapped to a different label. 

For example, in the image below, you will see a mask of a street image. 

Object detection labelling masks

You can see from this mask that all of the people are covered in orange. 

Each pixel representing orange is labeled as a person pixel. 

When you are developing an object detection algorithm, you teach it to identify pixels as items in your image.

The identification is based on the pixel mapping to a color-coded mask.

Does this make sense?

Bounding Boxes vs. Masks

Let’s take a look at the below table to understand the difference between bounding boxes and masking labels better.

Type of labelingLabel
Bounding Boxes– Labeled box outlining area of the object
– More common in current research datasets
– Give you regional + buffer as not all items are square
Masks– Colour coded pixel based map
– Newer labeling process
– More accurate in theory as exact shape matching

Object detection in real-time

Many of the applications above that we have discussed for object detection required to be able to identify objects in real-time. For examples, self-driving cars need you to be able to locate an item as soon as it appears.

Two algorithms currently give the best performance in real-time object detection.

These algorithms are the Faster-RCNN algorithm and the YOLO algorithm.


The Faster-RCNN algorithm works by generating two networks. The first of these networks are regional proposal network, and then the second is a network to detect objects.

The faster R-CNN architecture, comprises a set of nine anchors, or boxes, that generate the regional network.

These anchors form a map of the image, making proposals of the different regions that the image contains. 

Once we have this regional proposal network, the boxes are then pooled to create feature maps.

The first pooling technique is called ROI pooling. During ROI-Pooling, you split the feature map into a fixed number of roughly equal regions. 

Once you have these regions created, you then apply max pooling to each part. 

After Max pooling, it is time for the regions to be classified and the objects to be identified.

YOLO Object Detection Algorithm

The YOLO or you only look once, the algorithm works slightly different from a typical RCNN.

The algorithm will split your image into a grid typically 19 by 19. Once it has divided the algorithm, it will attempt to predict bounding boxes probabilities in each of the squares in your grid.

This is different from the regular RCNN, which it has the bounding boxes already labeled and tries to map them.

In the YOLO algorithm, we are instead predicting the probability that there will be a bounding box edge within the square in the grid you’re currently looking at.

Once all of the predictions of boxes have been made, those with low probability are discarded. 

Then once you have only the higher probability bounding boxes left, you pool them using a process called Non-max suppression. Non-max suppression identifies the most probable bounding box of the object.

This you only look once process makes the system really fast.

object detection algorithm

How to implement object detection step by step in Pytorch and FastAI

Now it is time to look at the code. To get started with object detection we will use the fastai library.

Advantage of FastAi versus Tensorflow

  • Less code – you will only need a couple of lines of code
  • Does it all for you: so all the tricky data organizing is done for you

All the code you will see here is taken from the fastai: practical deep learning for coders course module 3.

Step by step implementation

Pre-work: Upload the data

First things first, you need to get the data for your images and bounding boxes into your chosen interface

Step 1: Create the data bunch

A data bunch object is a fastai process where your training data, labels, preprocessing information and test data is stored.

create an object detection data bunch

Step 2: Create the learner

Once you have the data ready you need to create the U-Net learner that will learn to identify the objects in your images

unet learner and fastai

Step 3: Optimize your learning rate and train the algorithm

Once you have created the learner it is time to optimize the learning rate hyperparameter to improve your object detection algorithm.

find the learning rate and train

Step 4: Further optimization – if you want

You can improve the performance of your algorithm by unfreezing the pre-trained model used in the learner. The original set-up we saw used a pre-trained res-net model. This pre-trained model helps us to get the benefit of transfer learning.

However, once we have started training on our own data we can unfreeze it to optimize for our data.

further training optimization by unfreezing

Another way of optimizing and improving performance is starting by resizing your images to be smaller and then doubling the size later during the training process.

Now Take a look at the performance

Finally, it is time to see how your object detection algorithm performed!

object detection learner results

Ready to get started with Machine Learning Algorithms? Try the FREE Bootcamp

To summarise what we know now!

And that’s the end of object detection algorithms.

Today you have learned:

  • How object detection works
  • The different ways of labeling objects
  • How to implement object detection algorithms with fastai

What will you use your new skills for? Let me know in the comments.


Leave a Reply

Your email address will not be published.

%d bloggers like this: