Introduction to Dockerfile

Saturday, 12 September 2020

Introduction to Dockerfile

Docker is a layered filesystem so every ADD, COPY and RUN instruction will create a new layer and cache it.

ADD

takes in a src and destination
lets you copying into the Docker image files/directories from following sources:

local file or directory from your host (the machine building the Docker image)
you can extract a local tar file from the source directly into the destination
URL

valid use case for ADD is when you want to extract a local tar file into a specific directory in your Docker image

ARG

defines a variable that users can pass at build-time to the builder with the docker build command using the --build-arg <varname>=<value> flag
Docker build will always show you the line as is written down in the Dockerfile, despite the variable value - ARG value will not be substituted in the terminal output. [ARG substitution in RUN command not working for Dockerfile]
Value of the argument provided in docker build command line will overwrite the (default) one set in Dockerfile.
WARNING: all ARG values that are defined before FROM will be reset (empty) after FROM [ARG before FROM in Dockerfile doesn't behave as expected · Issue #34129 · moby/moby]
ARG variables are available in build time (RUN, COPY etc...). They are not embedded into image (like ENVs) and therefore can't be used in CMD or ENTRYPOINT commands (like ENVs).

Example:

Dockerfile:

ARG NPM_LOG_LEVEL=warn

RUN npm install --loglevel ${NPM_LOG_LEVEL}

Terminal:

$ docker build --pull --build-arg NPM_LOG_LEVEL=verbose -t my_app_image .

To pass ARG values into container we need to use ENV variables:

ARG APP_NAME=mysqlsh-demo

ARG DOCKER_ENTRYPOINT=docker-entrypoint.sh
...

# ARGs are available only in build time but not runtime so we need to pass their values to ENVs:
ENV APP_NAME=${APP_NAME}
ENV DOCKER_ENTRYPOINT=${DOCKER_ENTRYPOINT}

ENTRYPOINT "/usr/src/${APP_NAME}/${DOCKER_ENTRYPOINT}"

COPY

99% of the time you should prefer using COPY to ADD
Takes in a src and destination
Only lets you copy in a local file or directory from your host (the machine building the Docker image) into the Docker image itself. [Docker Tip #2: The Difference between COPY and ADD in a Dockerfile]

NOTE for ADD & COPY:
by default, it sets root:root as owner of new files in the image [Dockerfile: ADD does not honor USER: files always owned by root #6119] [Docker Copy and change owner]
All new files and directories are created with a UID and GID of 0, unless the optional --chown flag specifies a given username, groupname, or UID/GID combination to request specific ownership of the content added.
Examples:
ADD --chown=someuser:somegroup /foo /bar

COPY --chown=someuser:somegroup /foo /bar

Or other combinations of user/group name (or ID);

--chown=someuser:123

--chown=anyuser:anygroup

--chown=1001:1002

--chown=333:agroupname

CMD

Lets you define a default command to run when your container starts
Executed in run-time; does not execute anything at build time
Sets default command and/or parameters, which can be overwritten from command line when docker container runs
Has three forms:

Exec (preferred): CMD ["executable","param1","param2"]
Shell: CMD command param1 param2
ENTRYPOINT's default parameters list (no binary!): CMD ["param1","param2"]

Exec form executes stated executable and passes to it params listed.
Shell form invokes a command shell (e.g. sh -c) and passes both command (executable) and its params to it.
When used in the shell or exec formats, the CMD instruction sets the command to be executed when running the image.
If you would like your container to run the same executable every time, then you should consider using ENTRYPOINT in combination with CMD.
If the user specifies arguments to docker run then they will override the default specified in CMD.

ENTRYPOINT

Has two forms:

Exec (preferred): ENTRYPOINT ["executable", "param1", "param2"]
Shell: ENTRYPOINT command param1 param2

Exec form executes stated executable and passes to it params listed.
Shell form invokes a command shell (e.g. sh -c) and passes both command (executable) and its params to it.
Configures a container that will run as an executable. It should be used if container is intended to run the same executable every time. This means that we can pass arguments to the executable set as an entrypoint simply by listing them after the name of the container: $ docker run ...<container_name> param1 param2...
Default values of arguments can be specified with CMD instruction in JSON array format:

CMD ["param1", "param2"]

Executed in run-time
From Docker best practices:

The best use for ENTRYPOINT is to set the image’s main command, allowing that image to be run as though it was that command (and then use CMD as the default flags).

Let’s start with an example of an image named same as the binary (s3cmd) for the command line tool s3cmd:

ENTRYPOINT ["s3cmd"]
CMD ["--help"]

Now the image can be run like this to show the command’s help:

$ docker run s3cmd

Or using the right parameters to execute a command:

$ docker run s3cmd ls s3://mybucket

This is useful because the image name can double as a reference to the binary as shown in the command above.

If we named image as my-s3cmd-image we'd run the container as:

# These arguments get passed to s3cmd
docker run my-s3cmd-image ls s3://mybucket
docker run my-s3cmd-image sync s3://source s3://dest
docker run my-s3cmd-image --help

To override the entrypoint (if you need to), we'd use:

docker run --entrypoint /bin/bash my-s3cmd-image

We always need the image name in docker run. The ENTRYPOINT just pre-configures what command runs inside that image when it starts.

Nice example of entrypoint.sh.

This is an example how can operator (person who is running container from an image) pass arguments to the executable run upon the container's launch:

Dockerfile:

...
ENTRYPOINT [ "/my-app" ]
CMD [ "--param1=arg1_default" ]

Launching the container:

$ docker run ... my-app-image --param1=arg1_value

arg1_value will overwrite param1's default value (arg1_default).

If --param1 is omitted then arg1_default will be applied to param1 which will be passed to my-app executable.

sed - Passing variable from container start to file - Stack Overflow

While they seem similar, ENTRYPOINT and CMD serve different purposes and behave differently.

Key Difference:

ENTRYPOINT: Defines the executable that will always run. Arguments are appended to it.
CMD: Provides default arguments that can be completely replaced.

Practical Examples

With CMD:

dockerfile:

CMD ["s3cmd", "ls", "s3://mybucket"]

bash:

# Uses the default

docker run my-image

# Runs: s3cmd ls s3://mybucket

# Completely replaces CMD

docker run my-image echo "hello"

# Runs: echo "hello" (NOT s3cmd!)

With ENTRYPOINT:

dockerfile:

ENTRYPOINT ["s3cmd"]

bash:

# Must provide arguments

docker run my-image ls s3://mybucket

# Runs: s3cmd ls s3://mybucket

# Still runs s3cmd, just different args

docker run my-image --help

# Runs: s3cmd --help

# Even this runs s3cmd

docker run my-image echo "hello"

# Runs: s3cmd echo "hello" (probably an error!)

Best Practice: Use Both Together

dockerfile:

ENTRYPOINT ["s3cmd"]

CMD ["--help"]

This way:

s3cmd always runs (ENTRYPOINT)
If no arguments provided, it shows help (CMD as default)
Any arguments you provide replace CMD but still go to s3cmd

bash:

docker run my-image # Runs: s3cmd --help

docker run my-image ls s3://bucket # Runs: s3cmd ls s3://bucket

Use ENTRYPOINT when you want your container to behave like a specific command-line tool.

Use CMD when you want a default command that users might want to completely override.

With CMD ["s3cmd", "ls", "s3://mybucket"], you can replace all the arguments by simply providing new ones after the image name:

bash:

# Replace with completely different command

docker run my-image echo "hello"

# Runs: echo "hello"

# Replace with different s3cmd arguments

docker run my-image s3cmd sync s3://source s3://dest

# Runs: s3cmd sync s3://source s3://dest

# Replace with a shell

docker run my-image /bin/bash

# Runs: /bin/bash

Important:

CMD, whatever we provide after the image name completely replaces the entire CMD instruction. We're not appending to it - we're replacing it entirely.

So if we want to run a different s3cmd command, we need to include s3cmd again in our docker run command because we're replacing the whole thing.

This is why for tools like s3cmd, using ENTRYPOINT is often better:

dockerfile:

ENTRYPOINT ["s3cmd"]

CMD ["ls", "s3://mybucket"]

Then we can do:

docker run my-image # Runs: s3cmd ls s3://mybucket

docker run my-image sync s3://a s3://b # Runs: s3cmd sync s3://a s3://b

We only need to provide the arguments, not the s3cmd command itself again.

EXPOSE

Used optionally, only for documenting and giving a hint to whoever runs "docker run" which port should be published (with -p/--publish or -P/--publish-all).

...

EXPOSE 8080

...

This means that when we run the container, we should expose this port to the outside world. We have two options for this:

-p, --publish - maps a host port we specify (manually assign) to a running container port

docker run -p local_port:container_port
Example: docker run -p 8080:8080

-P, --publish-all - publishes all exposed ports to ports that Docker randomly picks (available high-order ports, higher than 30000)

Example: docker run -P

In both cases a firewall rule is created which maps a container port to a port on the Docker host.

| Docker Documentation
What is the difference between "expose" and "publish" in Docker? - Stack Overflow
Docker Tip #59: Difference between Exposing and Publishing Ports — Nick Janetakis

FROM

Set the baseImage to use for subsequent instructions
must be the first instruction in a Dockerfile.

FROM baseImage
FROM baseImage:tag
FROM baseImage@digest

If we don't want to use Docker Hub as Docker repository but some custom server, we can write:

FROM docker.example.com/image_name

Q: If Dockerfile contains e.g. FROM ubuntu:latest, does Docker image contain Ubuntu OS?

A Docker image built from FROM ubuntu:latest does contain the Ubuntu OS - but not in the way you might expect from a traditional virtual machine or full operating system installation.

Here's what's actually happening:

What Ubuntu in Docker includes:

The Ubuntu filesystem (directories like /bin, /usr, /etc)
Ubuntu's package manager (apt)
Core Ubuntu libraries and utilities (see below)
The Ubuntu userland tools and commands

What Ubuntu in Docker excludes:

The Linux kernel (Docker shares the host's kernel)
Hardware drivers
Init systems like systemd (usually)
Many background services that run on a full Ubuntu system

When you run docker run -it ubuntu:latest /bin/bash, you're getting a genuine Ubuntu environment - you can use apt update, install packages, and run Ubuntu-specific commands. But it's a minimal, containerized version.

The confusion often comes from the size difference. A full Ubuntu desktop installation might be several gigabytes, while the Ubuntu Docker base image is only around 70MB. This is because:

No GUI components - no desktop environment, graphical applications, or display drivers
Minimal package set - only essential system packages
Shared kernel - the biggest part of an OS (the kernel) comes from the host
No unnecessary services - no background daemons or services that aren't needed

So your Docker container absolutely contains Ubuntu - just the parts that matter for running applications in a containerized environment.

Core Ubuntu libraries and utilities in a Docker base image include:

Essential System Libraries:

glibc - The GNU C Library, fundamental for all C programs
libssl - OpenSSL cryptographic library for secure communications
zlib - Compression library used by many applications
ncurses - Library for text-based user interfaces
readline - Library for command-line editing and history

Core System Utilities:

bash - The default shell
coreutils - Essential commands like ls, cp, mv, mkdir, cat, grep
findutils - Tools like find and xargs
sed and awk - Text processing utilities
tar and gzip - Archive and compression tools
wget or curl - For downloading files
ps, top - Process monitoring tools

Package Management:

apt - Ubuntu's package manager
dpkg - Low-level package management tool
Essential package databases - Information about installed packages

System Configuration:

passwd and shadow utilities - User account management
mount utilities - Filesystem mounting tools
Basic networking tools - Like ping, netstat (though minimal)

File System Structure:

Standard Linux directory hierarchy (/bin, /usr, /etc, /var, /tmp)
Essential configuration files in /etc
Device files in /dev

The Ubuntu base image is built from "Ubuntu Base," which "delivers a functional user-space environment, with full support for installation of additional software" Base - Ubuntu Wiki while remaining minimal. It's essentially everything you need to run Ubuntu applications and install additional packages, but without the desktop environment, development tools, or server-specific services that come with full Ubuntu installations.

This is why you can run commands like apt update && apt install python3 in an Ubuntu container - all the core infrastructure is there, just streamlined for containerized environments.

FROM ... AS ...

The FROM ... AS syntax in a Dockerfile is used in multi-stage builds. It allows naming a build stage so that artifacts from one stage can be selectively copied into another stage. This is particularly useful for optimizing your final image by separating the build environment from the runtime environment, keeping the final image small and clean.

When to use FROM ... AS?

When you have multiple build stages in your Dockerfile.
To create a named stage for building your application or artifacts (e.g., a "builder" stage).
To refer to that named stage later in the Dockerfile using COPY --from=<name> to copy only the necessary files into the final image.
To avoid including build tools and intermediate files in the final image.
To improve maintainability and readability, since named stages avoid counting numeric indices when copying from earlier stages.

Example usage:

FROM golang:1.24 AS build

WORKDIR /src

COPY . .

RUN go build -o /bin/myapp

FROM alpine:latest

COPY --from=build /bin/myapp /bin/myapp

CMD ["/bin/myapp"]

In this example:

The first stage named build compiles a Go application.
The final stage uses a minimal base image and copies the compiled binary from the build stage.
This results in a smaller image since build tools and source files are excluded.

Benefits:

Keeps final images small and secure by omitting unnecessary build dependencies.
Simplifies complex build processes.
Enables debugging or partial builds by targeting specific stages.

This mechanism is documented officially by Docker as the primary purpose of FROM ... AS in defining named stages for multi-stage builds.

LABEL

Set the name of the image author.
LABEL maintainer="author@example.com"

This info is shown in docker inspect output.

RUN

Execute commands inside of Docker image
These commands get executed once at build time and get written into your Docker image as a new layer
Often used for installing software packages
Can be specified in two forms:

shell form, when it specifies arguments of /bin/sh -c

e.g. RUN echo "test"

exec form, when it specifies an executable and list of its arguments

e.g. RUN ["/bin/my_app", "arg1", "arg2"]

It can be overridden by command specified withing docker run
It's a good practice to chain multiple RUN commands into a single one because this means Docker image will have this single RUN layer instead of multiple so building will be faster and image will have smaller size [Docker Tip #3: Chain Your Docker RUN Instructions to Shrink Your Images — Nick Janetakis]

Sometimes we need to replace some text in some file with some other. This is how to define default replacement string, how to perform replacement with sed and how to inject replacement string from docker build args:

Dockerfile:

ARG DOCKER_IMAGE_REGISTRY=docker.example.com
FROM ${DOCKER_IMAGE_REGISTRY}/golang:alpine as build
# If we add
# RUN cat /etc/apk/repositories
# this will show that default apk package repositories are HTTP ones:
# http://dl-cdn.alpinelinux.org/alpine/v3.10/main
# http://dl-cdn.alpinelinux.org/alpine/v3.10/community
# We need to use some which supports HTTPS which can be found here:
# https://github.com/alpinelinux/aports/blob/master/main/alpine-mirrors/mirrors.yaml
# Example:
# If we set https://mirror.fit.cvut.cz/alpine/ via
# RUN sed -i 's/http\:\/\/dl-cdn.alpinelinux.org/https\:\/\/mirror.fit.cvut.cz/g' /etc/apk/repositories
# then registry will be:
# https://mirror.fit.cvut.cz/alpine/v3.10/main
# https://mirror.fit.cvut.cz/alpine/v3.10/community
# Use --build-arg in docker build to pass custom APK registry host. Example:
# $ docker build --build-arg APK_PACKAGE_REGISTRY=mirror.fit.cvut.cz --no-cache --network host -t my-app .
ARG APK_PACKAGE_REGISTRY=apk.example.com/random/path/alpine-remote
RUN sed -i "s:http\:\/\/dl-cdn.alpinelinux.org:https\:\/\/${APK_PACKAGE_REGISTRY}:g" /etc/apk/repositories
RUN cat /etc/apk/repositories
RUN apk --no-cache add ca-certificates

# alpine contains /bin/sh so we can use it via
# $ docker exec -it my-app sh
FROM ${DOCKER_IMAGE_REGISTRY}/alpine
LABEL maintainer="bojan.komazec@example.com"
COPY --from=build /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY ./bin/my-app /my-app
ENTRYPOINT [ "/my-app" ]
CMD [ "--param0=arg0" ]

To build this image and override APK_PACKAGE_REGISTRY:

$ docker build --build-arg APK_PACKAGE_REGISTRY=mirror.fit.cvut.cz --no-cache --network host -t my-app-image .

bash - How do I use variables in a sed command? - Ask Ubuntu
How to assign variable and use sed to replace contents of configuration file in Dockerfile? - Unix & Linux Stack Exchange

USER

Set the user name or UID to use when running the image in addition to any subsequent CMD, ENTRYPOINT, or RUN instructions that follow it in the Dockerfile.
sets the current user
this user is ignored by ADD and COPY commands (you need to use --chown)
if your service/app can run as non-root, run it as non-root
node Docker images from DockerHub contain by default pre-created non-root user node and group node for this purpose

RUN groupadd -g 999 appuser && \
useradd -r -u 999 -g appuser appuser
USER appuser

WORKDIR

Sets the working directory for other Dockerfile commands
Usually it's one of the first commands in the Dockerfile (after FROM)
Format: WORKDIR dir

dir - directory to be set as working directory

If dir does not exist it will be created
Dockerfile reference | Docker Documentation

Towards unprivileged container builds

[Docker RUN vs CMD vs ENTRYPOINT by Yury Pitsishin]
What is the difference between CMD and ENTRYPOINT in a Dockerfile?

The ENTRYPOINT specifies a command that will always be executed when the container starts.
The CMD specifies arguments that will be fed to the ENTRYPOINT.

Dockerfile cheat sheet (kapeli.com)
Dockerfile: ENTRYPOINT vs CMD

My Public Notepad

Pages

Saturday, 12 September 2020