An overview of some most widely used Object Recognition Algorithms
The object recognition a sub-branch of computer vision for finding and identifying objects in an image or video sequence. It ideally works in the same way the humans recognize a multitude of objects in images with little effort, despite the fact that the image of the objects varies in different view points, in many different sizes and scales or even when they are translated or rotated. It is one of the hot research topics in the field of CV which is being used in multiple application, ranging from robotics to the commonplace ecommerce and so forth. It is sometimes overwhelming for the naive research to select which technique is better suitable for his work. Therefore, in this article I am listing down and explaining in simple English without using much indigestible maths some of the most widely used and arguably most popular Object Detection/Recognition Algorithms.
SSD (Single Shot MultiBox Detector)
SSD is an unified framework for object detection with a single network.SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. SSD is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent ixel resampling stages. This makes SSD easy to train and straightforward to integrate into systems
that require a detection component. There are various flavors of SSD, please see the above mentioned paper and code for more details about the technique and its performance in practical scenarios.The above link also teaches how to use the codes independently or in certain application.
YOLO (You Only Look Once)
YOLO applies a single neural network to the full image. This network divides the image into different regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities. The YOLO is relatively faster technique. The base YOLO model processes images in real-time at 45 fps(frames per second). A smaller version of the network, Fast YOLO,
processes 155 fps. But on the down side, it can only detect 20 objects in different categories:
- bird, cat, cow, dog, horse, sheep
- aeroplane, bicycle, boat, bus, car, motorbike, train
- bottle, chair, dining table, potted plant, sofa, tv/monitor
There are various flavors of YOLO other than Fast YOLO, please see the above mentioned paper and code for more details about the technique and its performance in practical scenarios.The above link also teaches how to use the codes independently or in certain application.
R-FCN (Region based Fully Convolution Network)
R-FCN is a region-based object detection framework leveraging deep fully-convolutional networks, which is accurate and efficient. In contrast to previous region-based detectors such as Fast/Faster R-CNN that apply a costly per-region sub-network hundreds of times, our region-based detector is fully convolutional with almost all computation shared on the entire image. R-FCN can natually adopt powerful fully convolutional image classifier backbones, such as Residual Networks(ResNets), for object detection. This code has been tested on Windows 7/8 64 bit, Windows Server 2012 R2, and Ubuntu 14.04, with Matlab 2014a.R-FCN comes with variety of improvements, please see the above mentioned paper and code for more details about the technique and its performance in practical scenarios.The above link also teaches how to use the codes independently or in certain application.
Quietly recently Facebook proposed Mask RCNN technique that outpeformed all the techniques mentioned above and beyong but the codes are not yet open-sourced for the public use.
The article introduces some of the most widely used and discussed object recognition algorithms. There are many factors that drive the decision on selection of the appropriate algorithm to use. These factors include, but not limited to, available resources, computation time, complexity of the task, require accuracy and so forth.