My Public Notepad: Running NVIDIA DIGITS Docker container on Ubuntu

Installing NVIDIA DIGITS directly on your computer means that you'll:

spend a considerable amount of time in installing all dependencies and building DIGITS itself
pollute your machine with another application and its dependencies

To prevent this, we can run NVIDIA DIGITS Docker container. Let's check first whether docker is installed and its version :

$ docker --version

Docker version 20.10.3, build 48d30b5

For the reference, I was running the commands I listed below in this article on my Ubuntu 20.04:

$ lsb_release -a

No LSB modules are available.

Distributor ID: Ubuntu

Description: Ubuntu 20.04.2 LTS

Release: 20.04

Codename: focal

Ideally, we'd be running NVIDIA Digits on a machine with GPU(s). This would speed up training and inference but Digits can also work on a machine which has a CPU only.

I have GeForce GT 640 graphics card:

$ nvidia-smi -L

GPU 0: GeForce GT 640 (UUID: GPU-f2583df9-404d-2564-d332-e7878a94d087)

$ lspci

...

VGA compatible controller: NVIDIA Corporation GK107 [GeForce GT 640 OEM] (rev a1)

...

GK107 is a code name for GeForce GT 640 (GDDR5) (source: GeForce 600 series - Wikipedia) which, according to CUDA GPUs | NVIDIA Developer, has computing capability 3.5 (which is supported as it has to be >2.1 according to Installation Guide — NVIDIA Cloud Native Technologies documentation).

To test the local GPU we can run nvidia-smi application on the local host or in Docker image.

If we haven't installed CUDA or nvidia-smi locally, we can run nvidia-smi from NVIDIA CUDA Docker image:

$ sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

Thu Feb 11 01:02:09 2021

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |

|-------------------------------+----------------------+----------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

| | | MIG M. |

|===============================+======================+======================|

| 0 GeForce GT 640 Off | 00000000:01:00.0 N/A | N/A |

| 40% 31C P8 N/A / N/A | 286MiB / 1992MiB | N/A Default |

| | | N/A |

+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+

| Processes: |

| GPU GI CI PID Type Process name GPU Memory |

| ID ID Usage |

|=============================================================================|

| No running processes found |

+-----------------------------------------------------------------------------+

Let's now follow the instructions from DIGITS | NVIDIA NGC. We first need to download the image to our local host:

$ docker pull nvcr.io/nvidia/digits:20.12-tensorflow-py3

20.12-tensorflow-py3: Pulling from nvidia/digits

6a5697faee43: Pulling fs layer

ba13d3bc422b: Pulling fs layer

...

cec6045b0d0e: Pulling fs layer

cb4aa708e833: Waiting

235cfa23a5f4: Waiting

24781a3c82ea: Waiting

f7c7d47c1a97: Pull complete

...

b57dde2f2923: Pull complete

Digest: sha256:7542143bc2292fc48a3874786877815a5ca6a74a69366324aaf66914155cb5a7

Status: Downloaded newer image for nvcr.io/nvidia/digits:20.12-tensorflow-py3

nvcr.io/nvidia/digits:20.12-tensorflow-py3

Let's now run the container. docker run has --gpus option which instructs Docker to add GPU devices to container ('all' to pass all GPUs).

$ docker run --gpus all -it --rm nvcr.io/nvidia/digits:20.12-tensorflow-py3

docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

I haven't installed NVIDIA Container Toolkit (nvidia-docker) which enable Docker containers accessing host's GPU. Installation Guide — NVIDIA Cloud Native Technologies documentation describes how to install it:

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \

&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \

&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

$ sudo apt-get update

$ sudo apt-get install -y nvidia-docker2

$ sudo systemctl restart docker

$ nvidia-docker version

NVIDIA Docker: 2.5.0

Client: Docker Engine - Community

Version: 20.10.3

API version: 1.41

Go version: go1.13.15

Git commit: 48d30b5

Built: Fri Jan 29 14:33:21 2021

OS/Arch: linux/amd64

Context: default

Experimental: true

Server: Docker Engine - Community

Engine:

Version: 20.10.3

API version: 1.41 (minimum version 1.12)

Go version: go1.13.15

Git commit: 46229ca

Built: Fri Jan 29 14:31:32 2021

OS/Arch: linux/amd64

Experimental: false

containerd:

Version: 1.4.3

GitCommit: 269548fa27e0089a8b8278fc4fc781d7f65a939b

runc:

Version: 1.0.0-rc92

GitCommit: ff819c7e9184c13b7c2607fe6c30ae19403a7aff

docker-init:

Version: 0.19.0

GitCommit: de40ad0

To be on the safe side, I also installed the latest NVIDIA driver.

$ sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

[sudo] password for bojan:

Thu Feb 11 01:02:09 2021

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |

|-------------------------------+----------------------+----------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

| | | MIG M. |

|===============================+======================+======================|

| 0 GeForce GT 640 Off | 00000000:01:00.0 N/A | N/A |

| 40% 31C P8 N/A / N/A | 286MiB / 1992MiB | N/A Default |

| | | N/A |

+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+

| Processes: |

| GPU GI CI PID Type Process name GPU Memory |

| ID ID Usage |

|=============================================================================|

| No running processes found |

+-----------------------------------------------------------------------------+

This time running DIGITS container was successful. DIGITS 6.0 http server uses port 5000 by default and in this example it is mapped to host port 8888.

$ docker run --gpus all -it --rm -p 8888:5000 nvcr.io/nvidia/digits:20.12-tensorflow-py3

============

== DIGITS ==

============

NVIDIA Release 20.12 (build 17912121)

DIGITS Version 6.1.1

NVIDIA modifications are covered by the license terms that apply to the underlying project or file.

ERROR: No supported GPU(s) detected to run this container

___ ___ ___ ___ _____ ___

| \_ _/ __|_ _|_ _/ __|

| |) | | (_ || | | | \__ \

|___/___\___|___| |_| |___/ 6.1.1

Caffe support disabled.

Reason: A valid Caffe installation was not found on your system.

cudaRuntimeGetVersion() failed with error #999

2021-02-11 16:23:54.454747: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0

WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.

/opt/digits/digits/pretrained_model/views.py:32: SyntaxWarning: "is" with a literal. Did you mean "=="?

if str(files['weights_file'].filename) is '':

/opt/digits/digits/pretrained_model/views.py:38: SyntaxWarning: "is" with a literal. Did you mean "=="?

if str(files['model_def_file'].filename) is '':

/opt/digits/digits/pretrained_model/views.py:54: SyntaxWarning: "is" with a literal. Did you mean "=="?

if str(files['weights_file'].filename) is '':

/opt/digits/digits/pretrained_model/views.py:60: SyntaxWarning: "is" with a literal. Did you mean "=="?

if str(files['model_def_file'].filename) is '':

/opt/digits/digits/pretrained_model/views.py:169: SyntaxWarning: "is" with a literal. Did you mean "=="?

elif str(flask.request.form['job_name']) is '':

/opt/digits/digits/pretrained_model/views.py:177: SyntaxWarning: "is not" with a literal. Did you mean "!="?

if str(flask.request.files['labels_file'].filename) is not '':

2021-02-11 16:23:56 [INFO ] Loaded 0 jobs.

If we now open a browser on the host and type http://localhost:8888 we'll be able to see DIGITS home page:

As DIGITS is a web-based application we don't need to run it in interactive mode (docker run -it) but can run it in a detached mode (docker run -d):

$ docker run \

--gpus all \

-d \

--name digits \

--rm \

-p 8888:5000 \

-v /home/bojan/dev/digits-demo/data:/data \

-v /home/bojan/dev/digits-demo/jobs:/workspace/jobs \ nvcr.io/nvidia/digits:20.12-tensorflow-py3

905f9a8c8e48bc87ae99117eed92b855d45c7d37695c0e94433bd18fab6bfaca

We can verify that DIGITS container is indeed running:

$ docker ps

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

905f9a8c8e48 nvcr.io/nvidia/digits:20.12-tensorflow-py3 "/usr/local/bin/nvid…" About a minute ago Up About a minute 6006/tcp, 6064/tcp, 8888/tcp, 0.0.0.0:8888->5000/tcp digits

Why DIGITS doesn't recognize my GPU?

One thing didn't seem right to me though. In the upper right corner of the DIGITS home page should be a text which indicates how many GPUs are available. In my case, although I have one GPU, no GPUs were listed.

I tried first to check if GPU is indeed visible from the container:

$ docker exec -it digits bash

root@e58b860504a9:/workspace#

root@e58b860504a9:/workspace# nvidia-smi

Fri Feb 12 23:33:17 2021

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |

|-------------------------------+----------------------+----------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

| | | MIG M. |

|===============================+======================+======================|

| 0 GeForce GT 640 Off | 00000000:01:00.0 N/A | N/A |

| 40% 32C P8 N/A / N/A | 260MiB / 1992MiB | N/A Default |

| | | N/A |

+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+

| Processes: |

| GPU GI CI PID Type Process name GPU Memory |

| ID ID Usage |

|=============================================================================|

| No running processes found |

+-----------------------------------------------------------------------------+

Graphics card was visible. DIGITS installation contains a Python script which is DIGITS Device Query (source code: python/9427/DIGITS/digits/device_query.py). When I tried to run it, I got an error:

root@e58b860504a9:/opt/digits/digits# python device_query.py

cudaRuntimeGetVersion() failed with error #999

No devices found.

From CUDA Runtime API :: CUDA Toolkit Documentation:

cudaErrorUnknown = 999
This indicates that an unknown internal error has occurred.

CUDA was installed fine:

root@6cd6c429f20c:/workspace# nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver

Built on Mon_Oct_12_20:09:46_PDT_2020

Cuda compilation tools, release 11.1, V11.1.105

Build cuda_11.1.TC455_06.29190527_0

On the host system I checked if loading the NVIDIA driver gave any errors (NVRM errors are internal to the nvidia kernel module):

$ sudo dmesg |grep NVRM

[sudo] password for bojan:

[ 2.283911] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 460.32.03 Sun Dec 27 19:00:34 UTC 2020

[ 8654.742795] NVRM: GPU at PCI:0000:01:00: GPU-f2583df9-404d-2564-d332-e7878a94d087

[ 8654.742800] NVRM: Xid (PCI:0000:01:00): 31, pid=577, Ch 00000002, intr 10000000. MMU Fault: ENGINE HOST4 HUBCLIENT_HOST faulted @ 0x1_01160000. Fault is of type FAULT_INFO_TYPE_UNSUPPORTED

I could not deduct anything useful from here but by reading DIGITS release notes I finally found the reason why DIGITS won't recognize my GPU - it is too old!

Installation Guide — NVIDIA Cloud Native Technologies documentation specifies compute capability requirements for NVIDIA Container Toolkit but compute capability requirements for DIGITS Docker image are specified for each image release. For digits:20.12 DIGITS Release Notes :: NVIDIA Deep Learning DIGITS Documentation states the following:

Release 20.12 supports CUDA compute capability 6.0 and higher.

My GPU has compute capability 3.5 and so it does not meet that requirement.

References

Deep Learning for Object Detection with DIGITS | NVIDIA Developer Blog

Containers For Deep Learning Frameworks User Guide :: NVIDIA Deep Learning Frameworks Documentation

NVIDIA Docker: GPU Server Application Deployment Made Easy | NVIDIA Developer Blog

DIGITS Release Notes :: NVIDIA Deep Learning DIGITS Documentation

GeForce GT 640 (OEM) | Specifications | GeForce

DIGITS Docker container not picking up GPU - Deep Learning (Training & Inference) / DIGITS - NVIDIA Developer Forums

1 comment:

Nikitha8 April 2023 at 13:29
The NVIDIA Control Panel lets you manage the settings for your system's installed NVIDIA utilities and graphics drivers. You may utilise an NVIDIA card without NVIDIA Control Panel running. You can only adjust some graphic optimisation parameters in the NVIDIA Control Panel, such as customising resolutions and 3D settings, etc.
Nvidia control panel

My Public Notepad

Pages

Sunday, 14 March 2021

Running NVIDIA DIGITS Docker container on Ubuntu

Why DIGITS doesn't recognize my GPU?

References

1 comment: