Tuesday, 7 January 2020

Object Localization (Classification with Localization)

Goal


Predict what is the main subject of the image and its location.

Input 

  • image
  • list of labels (categories/classes)

Output

  • prediction of the class of the main subject of the image
  • prediction of the position of that object in the image (its bounding box - a minimal rectangle that completely contains it)


Method


Traditional: feature detection (HOG, Haar-like, ...) + classification (SVM,...)

CNN: Feature extraction + classification and bounding box prediction (regression).
           Model learns  both class and location.
 

Architecture


Typical architecture:

CNN where feature vector is fully connected to softmax layer (classifier - outputs class probabilities) and to 4-node layer (regressor - outputs bounding box coordinates and dimensions):

  • input layer
  • DNN (e.g. AlexNet)
  • feature vector (the output of convolution part of the network which summarizes the content of the image); 4096 nodes
  • fully connected layer that outputs class scores; connects 4096 feature vector nodes with e.g. 1000 nodes for each class; classification problem.
  • another fully connected layer that outputs bounding box coordinates: connects 4096 nodes of feature vector layer with 4 nodes (height, width and coordinates of the center) in the Box Coordinates layer; treats localization as regression problem
    Fully supervised setting: for each image we have annotated ground truth (correct) label and box coordinates.

    Loss Function


    During training (backpropagation) phase, if assuming fully supervised setting, we have two losses:
    • one for the predicted category, the one which describes difference between correct label and predicted class scores: Softmax Loss (this is actually a cross-entropy loss, which is standard loss function for Softmax layer [(28) Is the softmax loss the same as the cross-entropy loss? - Quora])
    • another one for the predicted box coordinates; L2 (Least Square Errors)  Loss - gives a measure of dissimilarity between predicted and ground truth bounding box [What Are L1 and L2 Loss Functions?]
    • total loss function is multi-task loss: weighted sum of these two losses

      Human Pose Estimation


      This idea of predicting the fixed number of positions in the image is also applied to Human Pose Estimation:

      • input: person in the image
      • output: position/coordinates of the joints (e.g. 14 joints: left/right foot, knee, hip, shoulder, elbow, hand; neck, head top)

      References:


      Stanford University School of Engineering: Fei-Fei Li, Justin Johnson, Serena Yeung: Convolutional Neural Networks for Visual Recognition: Lecture 11 | Detection and Segmentation. Link:
      Lecture 11 | Detection and Segmentation - YouTube

      3 comments:

      doron said...

      Are you desperately in need of a hacker in any area of your life??? then you can contact; ( www.hackintechnology.com services like; -hack into your cheating partner's phone(whatsapp,bbm.gmail,icloud,facebook, twitter,snap chat and others) -Sales of Blank ATM cards. -hack into email accounts and trace email location -all social media accounts, -school database to clear or change grades, -Retrieval of lost file/documents -DUIs -company records and systems, -Bank accounts,Paypal accounts -Credit cards hacker -Credit score hack -Monitor any phone and email address -Websites hacking, pentesting. -IP addresses and people tracking. -Hacking courses and classes CONTACT THEM= hackintechnologyatgmaildotcom or whatsapp +12132951376 their services are the best on the market and 100% security and discreet work is guarante

      Dave said...

      Are you interested in trading bitcoin binary and forex trade where you can earn 100% of your investment daily If you invest as low as $200 you will get a profit of $2,000 after 72 hoursand he deals with any kind of hack if you are intrested you can contact him via email: hackintechnology@gmail.com +12132951376(WHATSAPP) no force but i am sure you would come back thanking me

      micheal pan said...

      BE SMART AND BECOME RICH IN LESS THAN 3DAYS....It all depends on how fast 
      you can be to get the new PROGRAMMED blank ATM card that is capable of
      hacking into any ATM machine,anywhere in the world. I got to know about 
      this BLANK ATM CARD when I was searching for job online about a month 
      ago..It has really changed my life for good and now I can say I'm rich and 
      I can never be poor again. The least money I get in a day with it is about 
      $50,000.(fifty thousand USD) Every now and then I keeping pumping money 
      into my account. Though is illegal,there is no risk of being caught 
      ,because it has been programmed in such a way that it is not traceable,it 
      also has a technique that makes it impossible for the CCTVs to detect 
      you..For details on how to get yours today, email the hackers on : (
      atmmachinehackers1@gmail.com ). Tell your 
      loved once too, and start to live large. That's the simple testimony of how 
      my life changed for good...Love you all ...the email address again is ;
      atmmachinehackers1@gmail.com