Wednesday, 8 January 2020

Object Detection


Classify and locate multiple objects in the image. This task is different from Classification + Localization as number of outputs can vary.


  • image
  • fixed set of labels (categories/classes)


  • Bounding boxes around each object 
  • for each bounding box a confidence score which describes how confident is the model to say that it contains an object of the certain class 

We don't know how many objects will image contain ahead of time.

Object Detection.
(SSD method used)


If there is one object, our system will output 5 numbers (class prediction + 4 for bounding box coordinates). If there are N objects, our system will output N * 5 numbers. For this reason it's very tricky to think of Object Detection as a regression paradigm.

One of the early attempts to solve the problem of Object Detection was Haar Cascades proposed by Viola and Jones in 2001. But the great quality of results came only after deep learning was introduced.

Performance of Object Detection systems (measured in mAP - mean Average Precision) was increasing but started stagnating (around 40%) up to 2012 after which deep CNN started being used and mAP jumped to over 50% and started increasing to over 90% nowadays.

Non-maximum suppression is a post-processing step which discards all bounding boxes for which the confidence score is below a pre-set threshold

Object detection models can be grouped in the following way:

  • Traditional:
    • 3 stages:
      • Informative Region Selection (generation of candidate bounding boxes): sliding window
      • Feature extraction: SIFT (Scale-Invariant Feature Transform), HOG, Haar-like
      • Classification: SVM, AdaBoost, Deformable Part-based Model (DPM)
    • Examples:
      • Haar cascade classifier
      • Histogram of Oriented Gradient (HOG) features
    • Problems:
      • bounding boxes generated by sliding window are inefficient, redundant and inaccurate
      • manually engineered low-level feature descriptors
      • neither features nor bounding boxes are learned
  • Deep Learning (DNN)-based:
    • emerged with DNN/CNN (in 2012)
    • 2 types:
      • Two-stage (multi-shot) object detectors
        • 2 phases:
          • propose regions
          • for each region sequentially perform classification (find class probabilities) and regression (bounding box coordinates)
        • Sliding Window, Region-proposal based (R-CNN, Fast R-CNN, Faster R-CNN), SPP-Net
      • One-stage (single-shot) object detectors


    Stanford University School of Engineering: Fei-Fei Li, Justin Johnson, Serena Yeung: Convolutional Neural Networks for Visual Recognition: Lecture 11 | Detection and Segmentation. Link:
    Lecture 11 | Detection and Segmentation - YouTube


    Computer Vision: Crash Course Computer Science #35 - YouTube

    Paul Viola, Michael Jones: "Rapid Object Detection using a Boosted Cascade of Simple
    Features" (2001)

    Zhong-Qiu Zhao, Peng Zheng, Shou-tao Xu, Xindong Wu: Object Detection with Deep Learning: A Review (Apr 2019)

    1 comment:

    micheal pan said...

    BE SMART AND BECOME RICH IN LESS THAN 3DAYS....It all depends on how fast 
    you can be to get the new PROGRAMMED blank ATM card that is capable of
    hacking into any ATM machine,anywhere in the world. I got to know about 
    this BLANK ATM CARD when I was searching for job online about a month 
    ago..It has really changed my life for good and now I can say I'm rich and 
    I can never be poor again. The least money I get in a day with it is about 
    $50,000.(fifty thousand USD) Every now and then I keeping pumping money 
    into my account. Though is illegal,there is no risk of being caught 
    ,because it has been programmed in such a way that it is not traceable,it 
    also has a technique that makes it impossible for the CCTVs to detect 
    you..For details on how to get yours today, email the hackers on : ( ). Tell your 
    loved once too, and start to live large. That's the simple testimony of how 
    my life changed for good...Love you all ...the email address again is ;