Tuesday, 7 January 2020

Object Localization (Classification with Localization)


Predict what is the main subject of the image and its location.


  • image
  • list of labels (categories/classes)


  • prediction of the class of the main subject of the image
  • prediction of the position of that object in the image (its bounding box - a minimal rectangle that completely contains it)


Traditional: feature detection (HOG, Haar-like, ...) + classification (SVM,...)

CNN: Feature extraction + classification and bounding box prediction (regression).
           Model learns  both class and location.


Typical architecture:

CNN where feature vector is fully connected to softmax layer (classifier - outputs class probabilities) and to 4-node layer (regressor - outputs bounding box coordinates and dimensions):

  • input layer
  • DNN (e.g. AlexNet)
  • feature vector (the output of convolution part of the network which summarizes the content of the image); 4096 nodes
  • fully connected layer that outputs class scores; connects 4096 feature vector nodes with e.g. 1000 nodes for each class; classification problem.
  • another fully connected layer that outputs bounding box coordinates: connects 4096 nodes of feature vector layer with 4 nodes (height, width and coordinates of the center) in the Box Coordinates layer; treats localization as regression problem
    Fully supervised setting: for each image we have annotated ground truth (correct) label and box coordinates.

    Loss Function

    During training (backpropagation) phase, if assuming fully supervised setting, we have two losses:
    • one for the predicted category, the one which describes difference between correct label and predicted class scores: Softmax Loss (this is actually a cross-entropy loss, which is standard loss function for Softmax layer [(28) Is the softmax loss the same as the cross-entropy loss? - Quora])
    • another one for the predicted box coordinates; L2 (Least Square Errors)  Loss - gives a measure of dissimilarity between predicted and ground truth bounding box [What Are L1 and L2 Loss Functions?]
    • total loss function is multi-task loss: weighted sum of these two losses

      Human Pose Estimation

      This idea of predicting the fixed number of positions in the image is also applied to Human Pose Estimation:

      • input: person in the image
      • output: position/coordinates of the joints (e.g. 14 joints: left/right foot, knee, hip, shoulder, elbow, hand; neck, head top)


      Stanford University School of Engineering: Fei-Fei Li, Justin Johnson, Serena Yeung: Convolutional Neural Networks for Visual Recognition: Lecture 11 | Detection and Segmentation. Link:
      Lecture 11 | Detection and Segmentation - YouTube


      doron said...

      Are you desperately in need of a hacker in any area of your life??? then you can contact; ( www.hackintechnology.com services like; -hack into your cheating partner's phone(whatsapp,bbm.gmail,icloud,facebook, twitter,snap chat and others) -Sales of Blank ATM cards. -hack into email accounts and trace email location -all social media accounts, -school database to clear or change grades, -Retrieval of lost file/documents -DUIs -company records and systems, -Bank accounts,Paypal accounts -Credit cards hacker -Credit score hack -Monitor any phone and email address -Websites hacking, pentesting. -IP addresses and people tracking. -Hacking courses and classes CONTACT THEM= hackintechnologyatgmaildotcom or whatsapp +12132951376 their services are the best on the market and 100% security and discreet work is guarante

      Dave said...

      Are you interested in trading bitcoin binary and forex trade where you can earn 100% of your investment daily If you invest as low as $200 you will get a profit of $2,000 after 72 hoursand he deals with any kind of hack if you are intrested you can contact him via email: hackintechnology@gmail.com +12132951376(WHATSAPP) no force but i am sure you would come back thanking me

      micheal pan said...

      BE SMART AND BECOME RICH IN LESS THAN 3DAYS....It all depends on how fast 
      you can be to get the new PROGRAMMED blank ATM card that is capable of
      hacking into any ATM machine,anywhere in the world. I got to know about 
      this BLANK ATM CARD when I was searching for job online about a month 
      ago..It has really changed my life for good and now I can say I'm rich and 
      I can never be poor again. The least money I get in a day with it is about 
      $50,000.(fifty thousand USD) Every now and then I keeping pumping money 
      into my account. Though is illegal,there is no risk of being caught 
      ,because it has been programmed in such a way that it is not traceable,it 
      also has a technique that makes it impossible for the CCTVs to detect 
      you..For details on how to get yours today, email the hackers on : (
      atmmachinehackers1@gmail.com ). Tell your 
      loved once too, and start to live large. That's the simple testimony of how 
      my life changed for good...Love you all ...the email address again is ;

      Antonio Rainey said...

      I have been browsing over the net for days now and boom I felt on this interesting article which helped me change my mind set , I also learn a lot about Malta country and HOW TO GET MALTA CITIZENSHIP ONLINE SMOOTHLY WITHOUT ANY STRESS I've bookmark your site and furthermore include RSS. keep us refreshed all the time. Y’all don’t forget to join this EXPERTS TELEGRAM GROUP for more information about the Malta Citizenship and how to acquire them easily with no stress. You can also take advantage to learn and meet many Experts who will guide you on numerous techniques for anyone who love hacking and don’t know how to go about it .