Wednesday, 8 January 2020

Object Detection with Sliding Window


Take an input image and output:

  • predictions of bounding boxes (each box contains an object)
  • class scores for objects within bounding boxes


Turn this into a pure classification problem. Classification outputs only the class score for the entire image. So the idea here is that we'll take different crops from the input image, one by one and feed them through our previously trained convolutional network which does a classification decision on that input crop. Classifier is run at evenly spaced locations over the entire image.

In addition to object labels we'll have also a background as classification category. Now our network can predict background in case it doesn't see any of the categories that we care about.

Sliding Windows.
Original image of animals taken from

So we have a rectangular "window" which slides across the input image and classifier outputs prediction only for this crop visible through that window. Window can take various sizes and aspect ratios and it can move in small or longer steps (strides) so for some crops classifier will output higher scores for some classes.


Image --> [ Sliding Window cropping --> crop --> Classifier --> class scores ]

Process within angle brackets has to be repeated as many times as many crops we'll use.


Because there could be any number of objects in this image, objects could appear at any location, at any size, at any aspect ratio in the image so if you want to do kind of a brute force sliding window approach you'd end up having to test many different crops.

And in the case where every one of those crops is going to be fed through a giant convolutional network, this would be completely computationally intractable. So in practice people don't ever do this sort of brute force sliding window approach for object detection using convolutional networks.

There are two main approaches which try to improve on Sliding Window.

One family of detectors is trying to reduce number of crops by proposing Regions of Interest (Region-proposal detectors). They still perform classification sequentially on each RoI.

Another approach is using a single pass of the image through CNN (Single-shot detectors). OverFeat is an example of such detector.


Lecture 11 | Detection and Segmentation - YouTube


Anna Schafer said...

Great job for publishing such a beneficial web site. Your web log isn’t only useful but it is additionally really creative too. There tend to be not many people who can certainly write not so simple posts that artistically. Continue the nice writing mikita door and window

micheal pan said...

BE SMART AND BECOME RICH IN LESS THAN 3DAYS....It all depends on how fast 
you can be to get the new PROGRAMMED blank ATM card that is capable of
hacking into any ATM machine,anywhere in the world. I got to know about 
this BLANK ATM CARD when I was searching for job online about a month 
ago..It has really changed my life for good and now I can say I'm rich and 
I can never be poor again. The least money I get in a day with it is about 
$50,000.(fifty thousand USD) Every now and then I keeping pumping money 
into my account. Though is illegal,there is no risk of being caught 
,because it has been programmed in such a way that it is not traceable,it 
also has a technique that makes it impossible for the CCTVs to detect 
you..For details on how to get yours today, email the hackers on : ( ). Tell your 
loved once too, and start to live large. That's the simple testimony of how 
my life changed for good...Love you all ...the email address again is ;