Obscured object detection is a research area within computer vision.
This installment of academic papers made simple is discussing a recent article within the area of obscured object detection. The paper proposes a new approach for dealing with the situation where the object targeted is only partly visible.
What is ‘Academic Papers Made Simple’?
In Academic Papers Made Simple, we take a look at academic papers and make them easy to digest.
It can be intimidating when you first start diving into a technical area to read the academic works. Academic Papers Made Simple will help you dip your toe into the technological world while building up your knowledge.
You can stay on top of the latest developments in the field of AI and machine learning without having to dive deeply into lengthy papers.
Ready to learn more about obscured object detection?
Let’s jump in!
What is the paper we are discussing?
The paper in question is ‘DeepVoting: A Robust and Explainable Deep Network for Semantic Part Detection under Partial Occlusion.‘
What AI bucket does obscured object detection fall into?
Obscured object detection, (OOD), falls into the computer vision area of AI.
OOD is crucial as it represents a more realistic view of the world. Let’s imagine the scenario of a self-driving car.
As the car is traveling along a real-world road, it will be continually identifying objects to avoid. However, at times these objects are partially obscured. For example, a bicycle coming round the corner from behind a tree.
In these types of scenarios, the self-driving car must be able to recognize the bicycle quickly so that it does not hit it.
The quicker, the better.
What is the paper about?
In the paper, they cover two main points:
- The use of the Obscured Object dataset to train computer vision models (as opposed to un-obscured datasets of labeled objects)
- A new process to classify objects within a convolutional neural network called DeepVoting
What algorithms were used in obscured object detection?
The team uses a convolutional neural network where they add in two additional layers.
These new layers are:
- Visual Concept Layer
- Voting Layer
The visual concept layer uses a 1*1 Kernel to perform the convolution so that each item is considered on its own. The output of this layer is a visual map that identifies different parts of the overall object and maps them out in space depending on the distance between them.
The voting layer then allows the CNN to vote on what the items are. This layer also allows the algorithm to make predictions on what item is obscured on the target object.
The network can do this because it has a good understanding of the different parts of the object and how they relate to one another in space.
TRY THESE POPULAR PRODUCTS
What did the team from Baidu do that’s innovative?
The DeepVoting framework is innovative (visual concept + voting layer within CNN) because it is fully connected and uses voting based on multiple visual cues.
For example, it knows that the obscured object should be a wheel on a bicycle because it is near the handlebars and peddles.
Being able to identify parts means that it is better at identifying full objects on data sets. Furthermore, it can achieve state of the art results at a higher speed than the current state of the art method, Faster-RCNN.
What can a budding data scientist take from this paper and apply in their work?
This paper, like so many other innovative papers before it, is very intuitive.
Many lovers of neural networks will know that they try to replicate processes in the human brain.
This paper on obscured object detection is very similar.
Here they try to mimic the way human beings identify partially visible objects. The team looks at parts of the object and how they sit with one another. Then they use this information to identify the object.
In addition to mimicking human behavior, another takeaway for me is the importance of your data set.
By using the right dataset, they can create a better performing algorithm that is more fit for purpose in the real world.
This care when choosing a dataset something we can all use.
What do you think about this paper? Let me know in the comments below.