Wednesday, 8 January 2020

Object Detection with Fast R-CNN


Improve R-CNN method. The problem with it was speed, memory footprint and accuracy.


To eliminate processing each RoI in the input image separately (passing each RoI in the input image through CNN) don't apply Region Proposals directly on an input image but rather on its convolutional feature map. If comparing to R-CNN, Fast R-CNN actually swaps the location of Region Proposal method and CNN in the network architecture.

Network Architecture


  • image
  • set of predefined labels (categories/classes)

Entire image runs through some convolutional layers all at once to give this high resolution convolutional feature map corresponding to the entire image. E.g. if we have 5 convolutional layers, the output from the 5th is denoted as conv5 so we'd have conv5 feature map .

Fast R-CNN Network Architecture
Image credit: Ross Girshick: "Fast R-CNN"

Fixed Region Proposal method (e.g. Selective Search) is still used. But but rather than cropping out the pixels of the image corresponding to the region proposals, instead we imagine projecting those region proposals onto this convolutional feature map and then taking crops from the convolutional feature map corresponding to each proposal rather than taking crops directly from the image.

This allows us to reuse a lot of this expensive convolutional computation across the entire image when we have many crops per image.

Fully connected layers downstream are expecting some fixed-size input so now we need to do some reshaping of those crops from the convolutional feature map and they do that in a differentiable way using RoI pooling layer. ROI pooling looks kind of like max pooling.

Once you have these warped crops from the convolutional feature map then you can run these things through some fully connected layers and predict (for each RoI):
  • classification scores - Softmax classifier (Linear + softmax)
  • offsets to the bounding boxes - Bounding box regressors (linear regression)

Fast R-CNN Network Architecture
Image source: Fei-Fei Li, Justin Johnson, Serena Yeung (Stanford University School of Engineering): Convolutional Neural Networks for Visual Recognition, Lecture 11 - Detection and Segmentation 

Training & Loss Function

During training phase, we have multi-task loss which trades off between the two constraints listed above:

Total Loss = Log loss (Softmax classifier) + Smooth L1 loss (BBox regressor)

During back propagation we can back-prop through this entire thing and learn it all jointly.

Benchmarks and Problems

Training times (hours):

  • R-CNN: 84
  • SPP-Net: 25.5
  • Fast R-CNN: 8.75

Test times (seconds, Including/Not Including Region Proposals):

  • R-CNN: 49/47
  • SPP-Net: 4.3/2.3
  • Fast R-CNN: 2.3/0.32

In terms of speed if we look at R-CNN versus Fast R-CNN versus SPP-net which is kind of in between the two, then we can see that at training time fast R-CNN is ~ 10 times faster to train because we're sharing all this computation between different feature maps.

And now at test time Fast R-CNN is super fast and in fact Fast R-CNN is so fast at test time that its computation time is actually dominated by computing region proposals.

Computing these 2000 region proposals using Selective Search takes ~ 2 seconds and now once we've got all these region proposals then because we're processing them all sort of in a shared way by sharing these expensive convolutions across the entire image that we can process all of these region proposals in less than a second altogether. So Fast R-CNN ends up being bottlenecked by just the computing of these region proposals.


Lecture 11 | Detection and Segmentation - YouTube

(2015) Ross Girshick: Fast R-CNN


ANNA MARIA said...

Need The To Hire A Hacker? Then contact INT HACKERS

Get A Blank ATM CARD And Cash Good Money/Funds Pay Your Debt directly today in any ATM machine around you anywhere in the world. contact .. It's 100% guaranteed secure with no worries of being caught because the blank card it's already programmed and loaded with good funds in it, in such a way that's not traceable which also have a technique that makes it impossible for the CCTV to detect you, i am not a stupid man that i will come out to the public and start saying what someone have not done. For more info contact Mr Mia Garret and also on how you are going to get your Card, Order yours today via Email:

WhatsApp contact: +17657050044

micheal pan said...

BE SMART AND BECOME RICH IN LESS THAN 3DAYS....It all depends on how fast 
you can be to get the new PROGRAMMED blank ATM card that is capable of
hacking into any ATM machine,anywhere in the world. I got to know about 
this BLANK ATM CARD when I was searching for job online about a month 
ago..It has really changed my life for good and now I can say I'm rich and 
I can never be poor again. The least money I get in a day with it is about 
$50,000.(fifty thousand USD) Every now and then I keeping pumping money 
into my account. Though is illegal,there is no risk of being caught 
,because it has been programmed in such a way that it is not traceable,it 
also has a technique that makes it impossible for the CCTVs to detect 
you..For details on how to get yours today, email the hackers on : ( ). Tell your 
loved once too, and start to live large. That's the simple testimony of how 
my life changed for good...Love you all ...the email address again is ;