Wednesday, 19 February 2020

How to install and use Boost C++ Libraries in CMake project on Ubuntu

Download Boost archive from Version 1.72.0 (that is the current version at the time of writing).

Go to the directory where you want to install Boost:

$ cd ~/dev

Unpack the downloaded archive (tar will create boost_1_72_0 directory):

$ tar --bzip2 -xf ~/Downloads/boost_1_72_0.tar.bz2 
$ ls
avast  boost_1_72_0  github  gnome  go  ssh-keys  vm  vm-shared

Let's see the content of ~/dev/boost_1_72_0/:

~/dev/boost_1_72_0$ ls -A1

Let's see the content of ~/dev/boost_1_72_0/boost/:

~/dev/boost_1_72_0/boost$ ls -A1


Many Boost libraries are header-only which means we don't need to build them, it is only necessary to add relevant header in the source code and path to it to compiler (-I) options.

Boost documentation page Getting Started on Unix Variants contains a short demo code and I'm gonna use it here. It uses boost/lambda/lambda.hpp header from so let's see the content of ~/dev/boost_1_72_0/boost/lambda/ to confirm it's there:

~/dev/boost_1_72_0/boost/lambda$ ls -A1

Let's create a project in VSCode with following structure:



    // See
    // for the documentation about the tasks.json format
    "version": "2.0.0",
    "tasks": [
            "type": "shell",
            "label": "CMake && make",
            "options": {
                "cwd": "${workspaceFolder}/build"
            "command": "cmake make -DCMAKE_BUILD_TYPE=Debug .. && make -j 4",
            "group": {
                "kind": "build",
                "isDefault": true


#include <boost/lambda/lambda.hpp>
#include <iostream>
#include <iterator>
#include <algorithm>

int main() {
    std::cout << "main()" << std::endl;

    using namespace boost::lambda;
    typedef std::istream_iterator<int> in;
    std::for_each(in(std::cin), in(), std::cout << (_1 * 3) << " " );

    return 0;


cmake_minimum_required(VERSION 3.0)
set(SOURCE src/main.cpp)
set(BOOST_ROOT "$HOME/dev/boost_1_72_0")
add_executable(${PROJECT_NAME} ${SOURCE})

Press CTRL+SHIFT+B in order to trigger building the project:

> Executing task: cmake make -DCMAKE_BUILD_TYPE=Debug .. && make -j 4 <

-- Configuring done
-- Generating done
-- Build files have been written to: /home/bojan/dev/github/boost-demo/build
Scanning dependencies of target boost-demo
[ 50%] Building CXX object CMakeFiles/boost-demo.dir/src/main.cpp.o
[100%] Linking CXX executable boost-demo
[100%] Built target boost-demo

Terminal will be reused by tasks, press any key to close it.

The output is an executable which we can run now:

~/dev/github/boost-demo$ ./build/boost-demo 
3 2
6 3
9 4
12 7
21 ^C

The example above shows how it's simple to include Boost headers in C++ CMake project.

For some Boost features, we need to compile and install certain Boost libraries.

How to compile and install Boost binaries?

~/dev/boost_1_72_0$ ./ --help
`./' prepares Boost for building on a few kinds of systems.

Usage: ./ [OPTION]... 

Defaults for the options are specified in brackets.

  -h, --help                display this help and exit
  --with-bjam=BJAM          use existing Boost.Jam executable (bjam)
                            [automatically built]
  --with-toolset=TOOLSET    use specific Boost.Build toolset
                            [automatically detected]
  --show-libraries          show the set of libraries that require build
                            and installation steps (i.e., those libraries
                            that can be used with --with-libraries or
                            --without-libraries), then exit
  --with-libraries=list     build only a particular set of libraries,
                            describing using either a comma-separated list of
                            library names or "all"
  --without-libraries=list  build all libraries except the ones listed []
  --with-icu                enable Unicode/ICU support in Regex 
                            [automatically detected]
  --without-icu             disable Unicode/ICU support in Regex
  --with-icu=DIR            specify the root of the ICU library installation
                            and enable Unicode/ICU support in Regex
                            [automatically detected]
  --with-python=PYTHON      specify the Python executable [python]
  --with-python-root=DIR    specify the root of the Python installation
                            [automatically detected]
  --with-python-version=X.Y specify the Python version as X.Y
                            [automatically detected]

Installation directories:
  --prefix=PREFIX           install Boost into the given PREFIX
  --exec-prefix=EPREFIX     install Boost binaries into the given EPREFIX

More precise control over installation directories:
  --libdir=DIR              install libraries here [EPREFIX/lib]
  --includedir=DIR          install headers here [PREFIX/include]

To Be Continued...


Building C++ application on Ubuntu with CMake, make and VSCode

This article shows how to set up building C++ project with CMake and make in VSCode on Ubuntu.

First make sure cmake and make are installed:

$ cmake --version
cmake version 3.10.2
CMake suite maintained and supported by Kitware (

$ make --version
GNU Make 4.1
Built for x86_64-pc-linux-gnu
Copyright (C) 1988-2014 Free Software Foundation, Inc.
Licence GPLv3+: GNU GPL version 3 or later <>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Let's assume we have a C++ project with following structure:


build is an empty directory; this is where the build output will be located.


#include <iostream>

int main() {
    std::cout << "main()" << std::endl;
    return 0;


cmake_minimum_required(VERSION 3.0)
set(SOURCE src/main.cpp)
add_executable(${PROJECT_NAME} ${SOURCE})

In VSCode, press CTRL+SHIFT+P or go to View >> Command Palette, select Tasks: Configure Default Build Task and then Create tasks.json file from template and then Other.

VSCode - Configure Default Build Task
VSCode - Command Palette

This will create ./.vscode/tasks.json with a sample task:

    // See
    // for the documentation about the tasks.json format
    "version": "2.0.0",
    "tasks": [
            "label": "echo",
            "type": "shell",
            "command": "echo Hello"

We can replace it with cmake and make toolchain:

    // See
    // for the documentation about the tasks.json format
    "version": "2.0.0",
    "tasks": [
            "type": "shell",
            "label": "CMake && make",
            "options": {
                "cwd": "${workspaceFolder}/build"
            "command": "cmake make -DCMAKE_BUILD_TYPE=Debug .. && make -j 4",
            "group": {
                "kind": "build",
                "isDefault": true

To build a project press CTRL+SHIFT+B or go to Terminal >> Run Build Task.

We'd get the output like this:

> Executing task: cmake make -DCMAKE_BUILD_TYPE=Debug .. && make -j 4 <

-- Configuring done
-- Generating done
-- Build files have been written to: /home/xxxxxxx/xx/xxx/cmake-demo/build
Scanning dependencies of target cmake-demo
[ 50%] Building CXX object CMakeFiles/cmake-demo.dir/src/main.cpp.o
[100%] Linking CXX executable cmake-demo
[100%] Built target cmake-demo

Terminal will be reused by tasks, press any key to close it.

To run the executable (build output):

$ ./build/cmake-demo 


What is a CMake generator? - Stack Overflow
Ubuntu Manpage: cmake-generators - CMake Generators Reference
cmake(1) — CMake 3.5.2 Documentation
Visual Studio Code Setup for Beginners using C++ and CMake
C++ Development using Visual Studio Code, CMake and LLDB
How to compile C++ code with VS Code, CMake and NMake - 40tude
GCC and Make - A Tutorial on how to compile, link and build C/C++ applications
c++ - How does CMake choose gcc and g++ for compiling? - Stack Overflow

Friday, 14 February 2020

Running pgAdmin in Docker container

pgAdmin is a browser-based DB client. It is possible to run it from a Docker container  - an image is available at DockerHub: dpage/pgadmin4.

I assume we are also running PostgresDB Docker container.

To run pgAdmin Docker container on the same network as PostgresDB container execute:

$  docker run \
-p 5051:5051 \
-d \
-e "" \
--rm \
--name pgadmin \
--network my_network_default \

my_network_default is the name of the Docker network on which Postgres DB container is running. This allows using DB service name (as specified in docker-compose.yml) as the DB hostname when adding DB server in pgAdmin4. This is possible if DB container is run via docker-compose.

Once this container is up we can go to http://localhost:5051 in local browser and log in with credentials specified via PGADMIN_DEFAULT_EMAIL and PGADMIN_DEFAULT_PASSWORD. To add a new DB server we need to know either its hostname or its IP address.

If both containers are running on the same Docker network (which is our case here as we use --network) then for DB container hostname we can simply use the name of the DB service from docker-compose.yml e.g. db.

If we want to go via DB container IP address route we can find it out we can inspect Postgres container network:

$ docker inspect my_network_default

If you're using VSCode for development, use Docker plugin feature NETWORKS, right-click network of interest and select inspect.

Resolving Issues

In case of any issues, remove -d from this command line in order to run this container undetached in which case all useful output will appear in terminal.

How to update pgAdmin Docker image? 

To update pgAdmin Docker image and then run it, use:

$ docker pull dpage/pgadmin4 && docker run (...) dpage/pgadmin4

How to Persist Data between pgAdmin sessions?

If we restart pgAdmin container, all serves we added before will be lost and we'll need to add them again. To prevent that we can use local host directory as a mounted volume for persistence:

$ docker run \
-p 5051:5051 \
-d \
-e "" \
--rm \
--name pgadmin \
--network my_network_default \
-v "$(pwd)/pgadmin_data/servers.json":/pgadmin4/servers.json \
-v "$(pwd)/pgadmin_data/pgadmin":/var/lib/pgadmin \

It is not necessary to manually create local directory ./pgadmin_data/, it will be created by Docker.

From pgAdmin4 docs:

/var/lib/pgadmin - This is the working directory in which pgAdmin stores session data, user files, configuration files, and it’s configuration database. Mapping this directory onto the host machine gives you an easy way to maintain configuration between invocations of the container.
/pgadmin4/servers.json - If this file is mapped, server definitions found in it will be loaded at launch time. This allows connection information to be pre-loaded into the instance of pgAdmin in the container. Note that server definitions are only loaded on first launch, i.e. when the configuration database is created, and not on subsequent launches using the same configuration database.
pgAdmin runs as the pgadmin user (UID: 5050) in the pgadmin group (GID: 5050) in the container. You must ensure that all files are readable, and where necessary (e.g. the working/session directory) for this user on the host machine. For example:

$ sudo chown -R 5050:5050 ./pgadmin_data/

How to export and view exported table?

If we export some table into a csv file, we can see that the following command is executed:

This command is run when using pgAdmin web application to export DB table my_table to csv file:

"/usr/local/pgsql-11/psql" --command " "\\copy public.my_table (column1_name, column2_name...) TO '<STORAGE_DIR>/my_table.csv' CSV QUOTE '\"' ESCAPE '''';""

STORAGE_DIR path is defined in /pgadmin4/

We can view csv file it if we attach to pgadmin container's terminal:

$ docker exec -it pgadmin /bin/sh
/pgadmin4 # ls -la

/var/lib/pgadmin/storage/ # ls
/var/lib/pgadmin/storage/ # cat my_table.csv 

Sunday, 2 February 2020

Debugging C++ program in VSCode on Linux

Let's assume we have VSCode running in Ubuntu 18.04 and the following project structure:


Let's assume we want to name the build output cpp-demo and place it at project's root directory:


To build and debug it we need to have



tasks.json contains build instructions. From the main menu (CTRL+SHIFT+P), choose Tasks: Configure Default Build Task and then g++ build active file. This will create a /.vscode/tasks.json file and open it in the editor. A task will be created there and we need to edit it like this:


    "tasks": [
            "type": "shell",
            "label": "g++ build active file",
            "command": "/usr/bin/g++",
            "args": [
            "options": {
                "cwd": "/usr/bin"
            "group": {
                "kind": "build",
                "isDefault": true
    "version": "2.0.0"

To run the build task defined in tasks.json, press Ctrl+Shift+B or from the main menu choose Tasks: Run Build Task. This will run g++ compiler and create the output binary. To run it, open a new terminal (in VSCode) and run:


For the full list of variables used in VSCode configuration files see Visual Studio Code Variables Reference and visual studio code - VSCode environment variables besides ${workspaceRoot} - Stack Overflow.

If we want g++ to have verbose output we can add to list of arguments:


If we want linker to create map files (where we can see all mangled names of functions):



launch.json contains debugger settings.  From the main menu, choose Debug: Add Configuration... and then choose C++ (GDB/LLDB). You'll then see a dropdown for various predefined debugging configurations. Choose g++ build and debug active file. This creates a launch.json file and opens it in the editor. We need to edit it a bit e.g. to set correct name of the binary in program:


    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit:
    "version": "0.2.0",
    "configurations": [
            "name": "g++ build and debug active file",
            "type": "cppdbg",
            "request": "launch",
            "program": "${workspaceFolder}/cpp-demo",
            "args": [],
            "stopAtEntry": true,
            "cwd": "${workspaceFolder}",
            "environment": [],
            "externalConsole": false,
            "MIMode": "gdb",
            "setupCommands": [
                    "description": "Enable pretty-printing for gdb",
                    "text": "-enable-pretty-printing",
                    "ignoreFailures": true
            "preLaunchTask": "g++ build active file",
            "miDebuggerPath": "/usr/bin/gdb"

Current working folder (the one opened in VSCode, cpp-demo in our case) is workspace so ${workspaceFolder} contains path to it.

To start debugging, place some breakpoints in source code, press F5 or from the main menu choose Debug: Start Debugging


Get Started with C++ and Mingw-w64 in Visual Studio Code
Get Started with C++ and Windows Subsystem for Linux in Visual Studio Code
Configure launch.json for C/C++ debugging in Visual Studio Code
"g++" and "c++" compiler - Stack Overflow
c++ - What is the difference between g++ and gcc? - Stack Overflow
Compiling with g++

Friday, 31 January 2020

How to disable IPv6 on Ubuntu

Some web servers don't support IPv6 connections and might refuse such connections with 403 HTTP error (Forbidden).

In that case we need to disable IPv6 on machine's network interfaces. This is how to do it.

To check first that IPv6 traffic from your machine is enabled, go to some IP checker website (e.g. or and check what it detects (if it shows or not your IPv6 address).

You can also run

$ ifconfig | grep inet6

...and check if (local) IPv6 addresses are assigned to active interfaces.

To disable IPv6 do the following:

1) Open /etc/sysctl.conf

$ sudo nano /etc/sysctl.conf

2) Append the following lines to the existing configuration and save the file:

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
net.ipv6.conf.tun0.disable_ipv6 = 1

3) Instruct OS to re-read this config file:

$ sudo sysctl -p

To validate changes, repeat IPv6 validation steps described above.

Sunday, 19 January 2020

Building the Machine Learning Model

I came across this nice diagram which explains how to build the machine learning model so am sharing it with you. All credits go to its author, Chanin Nantasenamat.

Monday, 13 January 2020

Viber on PC not syncing? Here is the solution.

I've noticed that Viber on my Ubuntu PC stopped syncing messages with Viber app on my mobile phone. I didn't find solution on Viber Help pages so I had to find the fix myself. It's actually very simple: you just have to delete one file and restart the application, no Viber reinstall is needed!

Before everything, exit Viber application on PC.

Let's find all Viber files and directories:

$ sudo find / -name "*viber*"

NOTE: 440123456789 is the number of the mobile device to which you've been syncing messages so far.

/home/bojan/.ViberPC/440123456789/viber.db is file which contains all message history. Let's delete it: 

$ rm ~/.ViberPC/440123456789/viber.db 

Open Viber app on mobile phone and re-launch Viber on PC (from Applications or from Terminal, like here):

$ /opt/viber/Viber 
Attribute Qt::AA_ShareOpenGLContexts must be set before QCoreApplication is created.
qml: *** popupMode = 1920
qrc:/QML/DebugMenu.qml:262: TypeError: Cannot call method 'isWasabiEnabled' of undefined
qrc:/QML/DebugMenu.qml:289: TypeError: Cannot call method 'isSearchInCommunitiesForceEnabled' of undefined
qrc:/QML/DebugMenu.qml:296: TypeError: Cannot call method 'isOOABURISpamCheckerForceEnabled' of undefined
qrc:/QML/DebugMenu.qml:304: TypeError: Cannot call method 'isRateCallQualityForceEnabled' of undefined

We'll see prompts telling us to approve syncing on both PC and mobile applications:

Viber sync approval prompt on PC

Viber sync approval prompt on mobile phone
Viber sync start prompt on PC

After we approve syncing on both devices, syncing process will start:

Viber syncing message on mobile phone

Viber syncing message on PC
After the process completes your Viber on PC will be synced with mobile phone Viber app.

Sunday, 12 January 2020

Instance Segmentation


  • image
  • predefined set of categories


Predict locations and identities of objects in that image similar to object detection, but rather than just predicting a bounding box for each of those objects, instead we want to predict a whole segmentation mask for each of those objects and predict which pixels in the input image corresponds to each object instance.

Instance Segmentation is a full problem, like a hybrid between semantic segmentation and object detection because like in object detection we can handle multiple objects and we differentiate the identities of different instances.


(Differentiate instances)

In the example above Instance Segmentation distinguishes between the three sheep instances.

The output is like in semantic segmentation where we have this pixel wise accuracy but here for each of these objects we also want to say which pixels belong to that object.


The idea is to get region and classification predictions (for each object) and then apply semantic segmentation onto each of these regions.

Mask R-CNN

And this ends up looking a lot like Faster R-CNN.

So it has this multi-stage processing approach where we take our whole input image, that whole input image goes into some convolutional network and some learned region proposal network that's exactly the same as Faster R-CNN and now once we have our learned region proposals (input image goes through CNN - RPN) then we project those proposals onto our convolutional feature map just like we did in Fast and Faster R-CNN.

But now rather than just making a classification and a bounding box for regression decision
for each of those boxes we in addition want to predict a segmentation mask for each of those region proposals. So now it kind of looks like a semantic segmentation problem inside each of the region proposals that we're getting from our region proposal network.

Mask R-CNN Architecture
Kaiming He Georgia Gkioxari Piotr Dollar Ross Girshick: Mask R-CNN

After we do this RoI aligning to warp our features corresponding to the region of proposal
into the right shape, then we have two different branches.

First branch at the top looks just like Faster R-CNN and it will predict classification scores telling us what is the category corresponding to that region  proposal or alternatively whether or not it's background. And we'll also predict some bounding box coordinates that regressed off the region proposal coordinates.

Mask R-CNN Architecture in detail

Image source: Stanford University School of Engineering - Convolutional Neural Networks for Visual Recognition - Lecture 11 | Detection and Segmentation

And now in addition we'll have this branch at the bottom which looks basically like a semantic segmentation mini network which will classify for each pixel in that input region proposal whether or not it's an object. This Mask R-CNN architecture just kind of unifies Faster R-CNN and Semantic Segmentation models into one nice jointly end-to-end trainable model.

It works really well, just look at the examples in the paper. They look kind of indistinguishable from ground truth.

Pose Estimation

Mask R-CNN also does pose estimation. You can do pose estimation by predicting these joint coordinates for each of the joints of the person.

Mask R-CNN can do joint object detection, pose estimation, and instance segmentation.
And the only addition we need to make is that for each of these region proposals we add an additional little branch that predicts these coordinates of the joints for the instance of the current region proposal.

Addition for pose estimation

Image source: Stanford University School of Engineering - Convolutional Neural Networks for Visual Recognition - Lecture 11 | Detection and Segmentation

As another layer has been added (another head coming out of the network) we need to add another loss to our multi-task loss.

Because it's built on the Faster R-CNN framework it runs relatively close to real time so this is running something like 5fps on a GPU because this is all sort of done in the single forward pass of the network.


How much training data do you need?

All of these instant segmentation results were trained on the Microsoft Coco data set. Microsoft Coco is roughly 200,000 training images. It has 80 categories that it cares about so in each of those 200,000 training images it has all the instances of those 80 categories labeled. So there's something like 200,000 images for training and there's something like I think an average of five or six instances per image. So it actually is quite a lot of data. And for Microsoft Coco for all the people in Microsoft Coco they also have all the joints annotated as well so this actually does have quite a lot of supervision at training time. It is trained with quite a lot of data.

Training: Future improvements

One really interesting topic to study moving forward is that we kind of know that if you have a lot of data to solve some problem, at this point we're relatively confident that you can stitch up some convolutional network that can probably do a reasonable job at that problem but figuring out ways to get performance like this with less training data is a super interesting and active area of research.
That's something people will be spending a lot of their efforts working on in the next few years.


Semantic Segmentation


  • image (pixels)
  • list of categories


Each pixel in the image to be classified (to be assigned a category label).
Don't differentiate instances (objects), only care about pixels.


  • Every input pixel is assigned a category
  • Pixels of each category are painted with the same color e.g. grass, cat, tree, sky
  • If two instances of the same object are next to each other, entire area will have the same label and will be painted with same color


Approach #1: Sliding Window 

Approach #1 is to use sliding window where we are moving a small window across the image and apply DNN classification to determine the class of the crop which is then assigned to the central pixel of the crop.

This would be very computationally expensive as we'd need to classify (push crop through CNN) separate crop for each pixel in the image.

This would also be very inefficient for not reusing shared features between overlapping patches. If two patches overlap then the convolutional features of these patches will end up going through the same convolutional layers and we can actually share a lot of computation when applying this to separate passes or applying this type of approach to separate patches of the image. 

Approach #2: CNN, layers keep spatial size

Using fully convolutional network where whole network is a giant stack of convolutional layers with no fully connected layers where each convolutional layer preserves the spatial size of the input:

input image --> [conv] --> output image
  • input 3 x H x W
  • convolutions D x H x W: conv --> conv --> conv --> conv--> 
  • scores: C x H x W (C is the number of categories/labels)
  • argmax ==> Predictions H x W
Final convolutional layer outputs tensor C x H x W.

The size of the output image has to be the same as the input image as we want to have classification for each pixel, output image has to be pixel-perfect, with sharp and clear borders between segments.

All computations are done in one pass. 

Using convolutional layers which are keeping the same spatial size as the input image is super expensive and would take lots of memory for the huge number of parameters required (high resolution input image, input in each layer has multiple channels...).

Approach #3: CNN, downsampling + upsampling

Design network as a bunch of convolutional layers, with downlsampling and upsampling of the feature map inside the network:

input image --> [conv --> downsampling] --> [conv --> upsampling] --> output image

  • spatial information gets lost
  • e.g. max pooling, strided convolution

Upsampling: max unpooling or strided transpose convolution.


Put classification loss at every pixel at the output, take an average through space and train it through normal back propagation.


Creating training set is expensive and long manual process. Each pixel has to be labelled. There are some tools for drawing contours and filling in the regions.

Loss Function

Loss function: cross-entropy loss is computed for each pixel in the output and ground truth pixels; then sum or average is taken over space or mini-batch.


Individual instances of the same category are not differentiated. This is improved with Instance Segmentation.


Stanford University School of Engineering - Convolutional Neural Networks for Visual Recognition - Lecture 11 | Detection and Segmentation - YouTube

Saturday, 11 January 2020

Object Detection with SSD

SSD (Single Shot Multibox Detector) is a method for object detection (object localization and classification) which uses a single Deep Neural Network (DNN). Single Shot means that object detection is performed in a single forward pass of the DNN.

This method was proposed by Wei Liu et al. in December 2015 and revised last time in December 2016: SSD: Single Shot MultiBox Detector.


Fast Object Detection.


The SSD network, built on the VGG-16 network, performs the task of object detection and localization in a single forward pass of the network. This approach discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple features with different resolutions to naturally handle objects of various sizes. [source]

Here are some key points from the paper's abstraction:
  • SSD uses single deep neural network
  • SSD discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location
BK: note here that different aspect ratios and scales are not applied to anchor boxes in the image but feature map
  • At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. 
  • Our SSD model is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stage and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. 
  • Experimental results on the PASCAL VOC, MS COCO, and ILSVRC datasets confirm that SSD has comparable accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference. Compared to other single stage methods, SSD has much better accuracy, even with a smaller input image size. For 300×300 input, SSD achieves 72.1% mAP on VOC2007 test at 58 FPS on a Nvidia Titan X and for 500×500 input, SSD achieves 75.1% mAP, outperforming a comparable state of the art Faster R-CNN model. 

SSD Framework

Image source: Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg: SSD: Single Shot MultiBox Detector

We know that deeper Conv layers in CNNs extract/learn more complex features.
Feature maps preserve spatial structure of the input image but at lower resolution.

Lecture 11: Detection and Localization

If we take some CNN (like ResNet) pretrained for image recognition (image classification) and remove its last FC layers we'll get as its output a feature map as described above.

Now we can do something which YOLO does on the image - divide feature map into a grid cells and apply equidistant detector which predicts anchor boxes.

Given our input image (3 * H * W) you imagine dividing that input image into some coarse S * S grid, and now within each of those grid cells you imagine some set of B base bounding boxes (e.g. B = 3 base bounding boxes like a tall one, a wide one, and a square one but in practice you would use more than three). These bounding boxes are centered at each grid cell.

Now for each of these grid cells (S x S) network has to predict two things:

  • for each of these base bounding boxes (B): an offset off the base bounding box to predict what is the true location of the object off this base bounding box. 
    • This prediction has two components:
      • bounding box coordinates: dxdydh , dw
      • confidence
    • So the final output has B * 5 values
  • classification scores for each of C classes (including background as a class)

At the end we end up predicting from our input image this giant tensor:
S * S * (B * 5 + C)

So that's just where we have B base bounding boxes, we have five numbers for each giving our offset and our confidence for that base bounding box and C classification scores for our C categories.

So then we kind of see object detection as this input of an image, output of this three dimensional tensor and you can imagine just training this whole thing with a giant convolutional network.

And that's kind of what these single shot methods do where they just, and again matching the ground truth objects into these potential base boxes becomes a little bit hairy but that's what these methods do.


SSD has two components:

  • base (backbone) model
  • SSD head

Backbone model:

  • usually a pre-trained image classification network as a feature extractor from which the final fully connected classification layer has been removed; such NN is able to extract semantic meaning from the input image while preserving the spatial structure of the image albeit at a lower resolution
  • VGG-16 or ResNet trained on ImageNet 

SSD head:

  • one or more convolutional layers added to the backbone
  • outputs are interpreted as the bounding boxes and classes of objects in the spatial location of the final layers activations

SSD vs YOLO Network Architecture
Image source: Wei Liu et al.: "SSD: Single Shot MultiBox Detector"


Tensorflow Object Detection API comes with pretrained models where ssd_inception_v2_coco_2017_11_17 is one of them.

TensorRT/samples/opensource/sampleUffSSD at master · NVIDIA/TensorRT · GitHub
TensorFlow implementation of SSD, which actually differs from the original paper, in that it has an inception_v2 backbone. For more information about the actual model, download ssd_inception_v2_coco. The TensorFlow SSD network was trained on the InceptionV2 architecture using the MSCOCO dataset which has 91 classes (including the background class). The config details of the network can be found here.
Logo detection in Images using SSD - Towards Data Science
TensorFlow Object Detection API with Single Shot MultiBox Detector (SSD) - YouTube



Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, 

Alexander C. Berg: SSD: Single Shot MultiBox Detector


RattyDAVE/pi-object-detection: Raspberry Pi Object detection.

SSD : Single Shot Detector for object detection using MultiBox

13.7. Single Shot Multibox Detection (SSD) — Dive into Deep Learning 0.7.1 documentation

Understanding SSD MultiBox — Real-Time Object Detection In Deep Learning

How single-shot detector (SSD) works? | ArcGIS for Developers

(20) Is SSD really better than YOLO? - Quora

Review: SSD — Single Shot Detector (Object Detection)

SSD object detection: Single Shot MultiBox Detector for real-time processing

What do we learn from single shot object detectors (SSD, YOLOv3), FPN & Focal loss (RetinaNet)?