Sunday, 12 January 2020

Semantic Segmentation


  • image (pixels)
  • list of categories


Each pixel in the image to be classified (to be assigned a category label).
Don't differentiate instances (objects), only care about pixels.


  • Every input pixel is assigned a category
  • Pixels of each category are painted with the same color e.g. grass, cat, tree, sky
  • If two instances of the same object are next to each other, entire area will have the same label and will be painted with same color


Approach #1: Sliding Window 

Approach #1 is to use sliding window where we are moving a small window across the image and apply DNN classification to determine the class of the crop which is then assigned to the central pixel of the crop.

This would be very computationally expensive as we'd need to classify (push crop through CNN) separate crop for each pixel in the image.

This would also be very inefficient for not reusing shared features between overlapping patches. If two patches overlap then the convolutional features of these patches will end up going through the same convolutional layers and we can actually share a lot of computation when applying this to separate passes or applying this type of approach to separate patches of the image. 

Approach #2: CNN, layers keep spatial size

Using fully convolutional network where whole network is a giant stack of convolutional layers with no fully connected layers where each convolutional layer preserves the spatial size of the input:

input image --> [conv] --> output image
  • input 3 x H x W
  • convolutions D x H x W: conv --> conv --> conv --> conv--> 
  • scores: C x H x W (C is the number of categories/labels)
  • argmax ==> Predictions H x W
Final convolutional layer outputs tensor C x H x W.

The size of the output image has to be the same as the input image as we want to have classification for each pixel, output image has to be pixel-perfect, with sharp and clear borders between segments.

All computations are done in one pass. 

Using convolutional layers which are keeping the same spatial size as the input image is super expensive and would take lots of memory for the huge number of parameters required (high resolution input image, input in each layer has multiple channels...).

Approach #3: CNN, downsampling + upsampling

Design network as a bunch of convolutional layers, with downlsampling and upsampling of the feature map inside the network:

input image --> [conv --> downsampling] --> [conv --> upsampling] --> output image

  • spatial information gets lost
  • e.g. max pooling, strided convolution

Upsampling: max unpooling or strided transpose convolution.


Put classification loss at every pixel at the output, take an average through space and train it through normal back propagation.


Creating training set is expensive and long manual process. Each pixel has to be labelled. There are some tools for drawing contours and filling in the regions.

Loss Function

Loss function: cross-entropy loss is computed for each pixel in the output and ground truth pixels; then sum or average is taken over space or mini-batch.


Individual instances of the same category are not differentiated. This is improved with Instance Segmentation.


Stanford University School of Engineering - Convolutional Neural Networks for Visual Recognition - Lecture 11 | Detection and Segmentation - YouTube

1 comment:

micheal pan said...

BE SMART AND BECOME RICH IN LESS THAN 3DAYS....It all depends on how fast 
you can be to get the new PROGRAMMED blank ATM card that is capable of
hacking into any ATM machine,anywhere in the world. I got to know about 
this BLANK ATM CARD when I was searching for job online about a month 
ago..It has really changed my life for good and now I can say I'm rich and 
I can never be poor again. The least money I get in a day with it is about 
$50,000.(fifty thousand USD) Every now and then I keeping pumping money 
into my account. Though is illegal,there is no risk of being caught 
,because it has been programmed in such a way that it is not traceable,it 
also has a technique that makes it impossible for the CCTVs to detect 
you..For details on how to get yours today, email the hackers on : ( ). Tell your 
loved once too, and start to live large. That's the simple testimony of how 
my life changed for good...Love you all ...the email address again is ;