Friday, 10 May 2019

Introduction to Python

Python versions

To find out which version the executable python refers to, execute this:

$ python --version
Python 2.7.15rc1

Installing packages

$ pip install jsonschema2db

pip - How do I install a Python package with a .whl file? - Stack Overflow

$ pip install --user  /path/to/package.whl

Globally installed packages are located in:


Uninstalling Packages

Example how to uninstall globally installed package:

$ pip uninstall psycopg2
Uninstalling psycopg2-2.7.2:
  Would remove:
Proceed (y/n)? y
  Successfully uninstalled psycopg2-2.7.2

Importing packages

import datetime
from jsonschema2db import JSONSchemaToPostgres

Getting Package Info

$ python3
Python 3.6.7 (default, Oct 22 2018, 11:32:17) 
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import JSONSchema2DB
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'JSONSchema2DB'
>>> import jsonschema2db
>>> help(jsonschema2db)
>>> exit()

help(package_name) opens a documentation mode which lists all classes, functions etc.:

Help on module jsonschema2db:


    class JSONSchemaToDatabase(builtins.object)
     |  JSONSchemaToDatabase is the mother class for everything
     |  :param schema: The JSON schema, as a native Python dict
     |  :param database_flavor: Either "postgres" or "redshift"


If we want to see docs for some particular class within the package we can start package examination from the top level and go deeper:

$ python3
Python 3.6.9 (default, Apr 18 2020, 01:56:04) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import scrapy
>>> help(scrapy)
>>> help(scrapy.http)
>>> help(scrapy.http.response)
>>> help(scrapy.http.response.Response)

Another example:

(venv) xxx:~/path/to/tensorflow-demo$ python
Python 3.6.8 (default, Oct  7 2019, 12:59:55) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> help
Type help() for interactive help, or help(object) for help about object.
>>> help("packages")
No Python documentation found for 'packages'.
Use help() to get the interactive help utility.
Use help(str) for help on the str class.

>>> help("modules")
_operator           copyreg             nis                 tensorflow
_osx_support        crypt               nntplib             tensorflow_core
_pickle             cryptography        notebook            tensorflow_estimator


>>> help("tensorflow")

>>> help("tensorflow_core")

pydoc is is Python module which generates documentation about modules, classes, functions and methods.

First run:

$ pydoc modules


$ python 
Python 3.6.8 (default, Oct  7 2019, 12:59:55) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> help('modules') get a list of all modules.

Then run:

$ pydoc module


$ pydoc tensorflow_core
$ pydoc tensorflow_core.keras
$ pydoc tensorflow_core.keras.datasets.mnist

pydoc path:

$ which pydoc

pydoc can be used from virtual environment because. Virtual env's activate script exposes pydoc function (see section about virtual environments in this article).

How to find package version programmatically?

Almost every normal package in python assigns the variable .__version__ or VERSION to the current version. So if you want to find the version of some package you can do the following

import a
a.__version__ # or a.VERSION




def add(a, b):
    return a+b

Python 3 (3.5, from 2011) introduces (optional) function annotations:

def add(a: number, b: number) -> number:
    return a+b

After importing some module, we can use dir(module) to list all its variables and functions. To find out types of arguments and return value of some function we can use inspect.signature or pydoc.render_doc. Example:

print("inspect.signature(mnist.load_data): {}".format(inspect.signature(mnist.load_data)))

print("pydoc.render_doc(mnist.load_data): {}".format(pydoc.render_doc(mnist.load_data)))

introspection - How do I look inside a Python object? - Stack Overflow
PEP 257 -- Docstring Conventions |


class MyClass:

In Python 3 each class implicitly inherit from object.


class Derived(Base):

Member Functions

class C:
    def foo(self, a):

    def bar(self, b):

Python call function within class - Stack Overflow

Data Types


Python List


Python Dictionary Methods
python dictionaries and JSON (crash course)


Dependencies & Package Management

pip -  Package Manager

$ pip

  pip <command> [options]

  install                     Install packages.
  download                    Download packages.
  uninstall                   Uninstall packages.
  freeze                      Output installed packages in requirements format.
  list                        List installed packages.
  show                        Show information about installed packages.
  check                       Verify installed packages have compatible dependencies.
  search                      Search PyPI for packages.
  wheel                       Build wheels from your requirements.
  hash                        Compute hashes of package archives.
  completion                  A helper command used for command completion.
  help                        Show help for commands.

General Options:
  -h, --help                  Show help.
  --isolated                  Run pip in an isolated mode, ignoring environment variables and user configuration.
  -v, --verbose               Give more output. Option is additive, and can be used up to 3 times.
  -V, --version               Show version and exit.
  -q, --quiet                 Give less output. Option is additive, and can be used up to 3 times (corresponding to WARNING, ERROR, and CRITICAL logging levels).
  --log <path>                Path to a verbose appending log.
  --proxy <proxy>             Specify a proxy in the form [user:passwd@]proxy.server:port.
  --retries <retries>         Maximum number of retries each connection should attempt (default 5 times).
  --timeout <sec>             Set the socket timeout (default 15 seconds).
  --exists-action <action>    Default action when a path already exists: (s)witch, (i)gnore, (w)ipe, (b)ackup, (a)bort.
  --trusted-host <hostname>   Mark this host as trusted, even though it does not have valid or any HTTPS.
  --cert <path>               Path to alternate CA bundle.
  --client-cert <path>        Path to SSL client certificate, a single file containing the private key and the certificate in PEM format.
  --cache-dir <dir>           Store the cache data in <dir>.
  --no-cache-dir              Disable the cache.
                              Don't periodically check PyPI to determine whether a new version of pip is available for download. Implied with --no-index.

To check its version and for which Python version it has been installed:

$ pip -V
pip 9.0.1 from /usr/lib/python2.7/dist-packages (python 2.7)

To list packages installed via pip:

$ pip list
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
asn1crypto (0.24.0)
change-case (0.5.2)
cryptography (2.1.4)
dnspython (1.15.0)
enum34 (1.1.6)
gyp (0.1)
idna (2.6)
ipaddress (1.0.17)
iso8601 (0.1.12)
JSONSchema2DB (1.0.1)

To install some package via pip:

$ pip install jsonschema2db

To list properties of the installed package (including its location on disk!) use pip show:

$ pip show jsonschema2db
Name: JSONSchema2DB
Version: 1.0.1
Summary: Generate database tables from JSON schema
Author: Erik Bernhardsson
License: MIT
Location: /home/bojan/.local/lib/python2.7/site-packages
Requires: iso8601, change-case, psycopg2

To uninstall a package, use:

$ pip uninstall JSONSchema2DB
Uninstalling JSONSchema2DB-1.0.1:
Proceed (y/n)? y
  Successfully uninstalled JSONSchema2DB-1.0.1


$ pip freeze

To save this info in requirements.txt:

$ pip freeze > requirements.txt

Dockerfile before requirements.txt:

RUN python3 -m pip install psycopg2
RUN python3 -m pip install jsonschema2db



Dockerfile after requirements.txt:

RUN python3 -m pip install -r requirements.txt

We can only list names of packages in this file but sometimes it is necessary to specify their versions as otherwise we might get pip error like this:

ERROR: jsonschema2db 1.0.1 has requirement psycopg2==2.7.2, but you'll have psycopg2 2.8.2 which is incompatible.


Pip3 is the Python3 version of pip. If you just use pip, then only the python2.7 version will be installed. You have to use pip3 for it to be installed on Python3. [source]

If pip3 is not installed:

$ python3 -m pip install JSONSchema2DB
/usr/bin/python3: No module named pip

$ pip3
Command 'pip3' not found, but can be installed with:
sudo apt install python3-pip

Let's install it:

$ sudo apt install python3-pip
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Setting up python3.6-dev (3.6.7-1~18.04) ...
Setting up python3-lib2to3 (3.6.7-1~18.04) ...
Setting up python3-distutils (3.6.7-1~18.04) ...
Setting up libpython3-dev:amd64 (3.6.7-1~18.04) ...
Setting up python3-pip (9.0.1-2.3~ubuntu1) ...
Setting up python3-setuptools (39.0.1-2) ...
Setting up dh-python (3.20180325ubuntu2) ...
Setting up python3-dev (3.6.7-1~18.04) ...

Now we have:

$ pip -V
pip 9.0.1 from /usr/lib/python2.7/dist-packages (python 2.7)
$ pip3 -V
pip 9.0.1 from /usr/lib/python3/dist-packages (python 3.6)

We can now use it to install some package for python3.

$ python3 -m pip install JSONSchema2DB
Collecting JSONSchema2DB
  Using cached 
Installing collected packages: change-case, iso8601, psycopg2, JSONSchema2DB
Successfully installed JSONSchema2DB-1.0.1 change-case-0.5.2 iso8601-0.1.12 psycopg2-2.7.2

Let's verify it is installed for Python3:

$ pip3 show JSONSchema2DB
Name: JSONSchema2DB
Version: 1.0.1
Summary: Generate database tables from JSON schema
Author: Erik Bernhardsson
License: MIT
Location: /home/bojan/.local/lib/python3.6/site-packages
Requires: change-case, iso8601, psycopg2

Special Variables


When the Python interpeter reads a source file, it first defines a few special variables e.g. __name__ variable. If you are running your module (the source file) as the main program, e.g.

$ python

...the interpreter will assign the hard-coded string "__main__" to the __name__ variable.

If there's a statement like this in the main program, or in some other module the main program imports:

# Suppose this is in some other main program.
import foo

The interpreter will search for your file (along with searching for a few other variants), and prior to executing that module, it will assign the name "foo" from the import statement to the __name__ variable, i.e.

# It's as if the interpreter inserts this at the top
# of your module when it's imported from another module.
__name__ = "foo"

An Introduction To Venv

Install all necessary components:

$ sudo apt update
$ sudo apt install python3-dev python3-pip
$ sudo pip3 install -U virtualenv  # system-wide install


$ python3 --version
$ pip3 --version
$ virtualenv --version

Create virtual environment (we chose to use python3 as Python interpreter and venv as the directory where to store virt. env.):

$ virtualenv --system-site-packages -p python3 ./venv

Once virtual environment is created, we'll have venv/bin/activate script created. To explore it:

$ cat venv/bin/activate

deactivate() {

export PATH
pydoc () {
    python -m pydoc "$@"

We can see that we'll now have a new environment variable set - VIRTUAL_ENV and also a deactivate commmand.

To activate virtual environment:

$ source ./venv/bin/activate

When virtualenv is active, your shell prompt is prefixed with its name in form: (venv).

If we've saved requirements.txt, we can install all dependencies now with:

$ pip install -r requirements.txt

To exit virtual environment:

(venv) $ deactivate

How to rename virtual environment?

directory - How to rename a virtualenv in Python? - Stack Overflow

From inside active virtual environment:

$ pip freeze > requirements.txt
$ deactivate

From python - Pip freeze vs. pip list - Stack Overflow
pip list shows ALL packages. 
pip freeze shows packages YOU installed via pip (or pipenv if using that tool) command in a requirements format.
Also, be aware of `$ pip freeze > requirements.txt` considered harmful

Then delete old environment directory. E.g.

$ rm -r ~/python-envs/tensorrt-

...and create a new one with correct name:

$ python3 -m virtualenv -p python3 ~/python-envs/tensorrt-
Already using interpreter /usr/bin/python3
Using base prefix '/usr'
New python executable in /home/nvidia/python-envs/tensorrt-
Also creating executable in /home/nvidia/python-envs/tensorrt-
Installing setuptools, pkg_resources, pip, wheel...done.

Activate it and install all packages from the saved list:

$ source ~/python-envs/tensorrt-
$ pip install -r requirements.txt
$ pip3 install --extra-index-url tensorflow-gpu==1.15.0+nv19.12

Installing Python in Docker

How to compare version strings?

How do I compare version numbers in Python?

from packaging import version
>>> version.parse("2.3.1") < version.parse("10.1.2")

If module packaging is not installed, you might get the following error:

No module named 'packaging'

Solution is to install it via pip (after optionally updating pip itself):

$ pip3 install --upgrade pip
$ pip3 install packaging

References and Further Reading

Learn Python Programming - The Definitive Guide

How to install multiple python packages at once using pip

Introduction to PostgreSQL

Installation & Running 

To start the server:

$ initdb /usr/local/pgsql/data

To start that instance:

$ postgres -D /usr/local/pgsql/data

Naming Convention

Identifiers and Key Words

SQL identifiers and key words must begin with a letter (a-z, but also letters with diacritical marks and non-Latin letters) or an underscore (_). Subsequent characters in an identifier or key word can be letters, underscores, digits (0-9), or dollar signs ($).

What is a valid PostgreSQL database name?

PostgreSQL naming conventions

System Functions & General Queries

How to check if a table exists in a given schema

SELECT to_regclass('schema_name.table_name');

This query returns name of the table if it exists and null if it doesn't.


PostgreSQL Data Types


PostgreSQL determine column type when data_type is set to ARRAY

SELECT column_name, data_type, udt_name::regtype
FROM information_schema.columns 
WHERE table_schema = 'public'
  AND table_name='table_name';

Postgres - How to check for an empty array

WHERE  datasets = '{}'


Create table with a column of TEXT[] type:

    id integer NOT NULL,
    list text[] COLLATE pg_catalog."default",
    CONSTRAINT test_pkey PRIMARY KEY (id)

insert into test (id, list) values (1, array['one', 'two', 'three']);
insert into test (id, list) values (2, array['four']);
insert into test (id, list) values (3, array['']);
insert into test (id, list) values (4, array[]::text[]); // empty array
insert into test (id, list) values (5, null);

select * from test where list = null;
returns no rows.

select * from test where list = '{}'; 
returns row with id = 4.

The following queries are invalid (have syntax errors):

select * from test where list = {};
select * from test where list = [];
select * from test where list = [null];

...but this one works:

select * from test where list is null;

How to write a WHERE clause for NULL value in ARRAY type column?

Note that array index start from 1 (not from 0):

select * from app_version where app_version_number[1] in ('75.0.1329.81', '75.0.1329.82');


Use single quotes in SQL

COALESCE(descriptions[1], '') as description,
status = 'publish'

If we use "" (double quoted empty string) as default value in COALESCE command, we'll get an error:

ERROR: zero-length delimited identifier at or near “”“”

postgresql - ERROR: zero-length delimited identifier at or near """" LINE 1: DELETE FROM "regions" WHERE "regions"."" = $1 - Stack Overflow :
You want single quotes not double quotes around empty strings, double quotes make delimited identifiers, and "" isn't a meaningful identifier.

Query Examples


Be careful!

DELETE FROM contacts 
WHERE id = 3 AND id = 4;

won't delete any rows as each row has unique id so id can't be 3 and 4.
What you want instead is:

DELETE FROM contacts 
WHERE id = 3 OR id = 4;


PostgreSQL ADD COLUMN: Add One Or More Columns To a Table

ALTER TABLE table_name
ADD COLUMN new_column_name data_type constraint;


ALTER TABLE customers 

ALTER TABLE customers 

ALTER TABLE customer 

ALTER TABLE contacts 
ADD COLUMN magic_numbers INTEGER[];

ALTER TABLE contacts 
ADD COLUMN resources JSON[];


e.g. When selecting rows where id is between (and including) some two values.

Select a range of integers


value BETWEEN low AND high;

is the same as:

value >= low and value <= high;


select *
from yourtable
where col1 >= 33254 and col1 <= 33848;

is the same as:

select *
from yourtable
where col1 between 33254 and 33848;

Example in SQL Fiddle


Cast value to specific type.

If table books has a column id of type integer and we want to return all ids as strings:

select cast(id as text) from books

...will return a column named id and of type text.


PostgreSQL does not have the ISNULL function. However, you can use the COALESCE function which provides the similar functionality. It accepts an unlimited number of arguments and returns the first argument that is not null. If all arguments are null, the COALESCE function will return null.

COALESCE(expression, replacement)

example (country_names is of type []text):

SELECT COALESCE(country_names[1], "") AS names FROM countries;

If we perform transformation of null values to empty strings at SQL level we don't have to do that in the application (e.g. in Go applications we don't have to use sql.NullString but just a string as the type of the target field into which text would be read from DB).

(33) What is the PostgreSQL equivalent for ISNULL()? - Quora


Create database:



CREATE TABLE customers (
   customer_name VARCHAR NOT NULL


-- Table: public.test

-- DROP TABLE public.test;

CREATE TABLE public.test
    id integer NOT NULL,
    list text[] COLLATE pg_catalog."default",
    CONSTRAINT test_pkey PRIMARY KEY (id)
TABLESPACE pg_default;

ALTER TABLE public.test
    OWNER to postgres;


Delete all rows from table:

DELETE FROM my_table;


WHERE condition;

USING another_table


WHERE = (SELECT id FROM another_table);


WHERE id = 8;

my_db=# delete from my_schema.my_table;


To return distinct values from some column:

select distinct column_name from table_name;

So, if some column has 100 rows but only 3 possible values are there, select distinct will return only 3 rows and each will have one unique value.





DROP TABLE author;


Used to filter by value which is in the list of values:

SELECT * FROM Customers WHERE Country IN ('Germany', 'France', 'UK');


Insert a row into table:

INSERT INTO schema_name.table_name (col1, col2, ...) 
VALUES (val1, val2, ...);


my_db=# insert into my_schema.my_table (item_id, prefix, post_id) values (1, 'a1',1234);

Does insert query has to list all columns?
No, columns that are not listed will be assigned null values.

INSERT into table without specifying Column names - this is not good practice.

The following queries will add new rows - they will not update existing rows:

INSERT INTO customers (customer_name)

INSERT INTO contacts (magic_numbers)
   (ARRAY[1, 11, 111]),
   (ARRAY[2, 22, 222]);

Examples how to insert an array of JSON objects:

insert into public.contacts (name, phones, magic_numbers, resources) 
values (
   'Mike Tyson', 
   ARRAY['1-800-3433', '1-800-3434'], 
   ARRAY[1, 22, 33], 
   ARRAY['{ "url": "", "desc": "Wikipedia page" }']::json[]

insert into public.contacts (name, phones, magic_numbers, resources) 
values (
   'Tom Cruise', 
   ARRAY['1-800-4433', '1-800-4434'], 
   ARRAY[2, 33, 444], 
      '{ "url": "", "desc": "Wikipedia page" }', 
      '{ "url": "", "desc": "Rational Wiki page" }'


DELETE FROM contacts 
WHERE id = 3 OR id = 4;


List all table names and number of rows:

SELECT relname,n_live_tup 
FROM pg_stat_user_tables 
ORDER BY n_live_tup DESC;

Print the number of rows in a table:

select count(*) from campaign

Select all rows in a table:

SELECT * FROM my_table;

Select specific columns from a table:

SELECT col1, col2,... FROM schema.table;


my_db=# select id, item_id, prefix from my_schema.my_table;
 id | item_id | prefix 
(0 rows)


To add a value of a (new) column to an existing row:

UPDATE contacts 
SET magic_numbers = ARRAY[1, 11, 111]
WHERE id = 1;



SET PUBLISHED_DATE = '01/10/1967' 
   AUTHOR = 'Ivo Andric' 
   NAME = 'Na Drini Cuprija';

UPDATE contacts 
SET magic_numbers = ARRAY[2, 22, 222]
WHERE id = 2;

In SQL, How to add values after add a new column in the existing table?


... WHERE name = 'Bill';
... WHERE id = 1;

...WHERE (last_name = 'Smith')
OR (last_name = 'Anderson' AND state = 'Florida')
OR (last_name = 'Ferguson' AND status = 'Active' AND state = 'Calfornia');

...WHERE (last_name = 'Anderson' OR last_name = 'Smith')
AND customer_id > 340;

... WHERE employee_id >= 500
AND (last_name = 'Smith' OR last_name = 'Johnson');

How to write condition on column of an array type e.g. TEXT[]?

... WHERE my_column[1] = '1'

... WHERE aliases[1] <> aliases[2];

(!) NOTE: By default, the lower bound index value of an array's dimensions is set to one.


PostgreSQL Transaction


Starts a new transaction.


... goes a set of queries...
...that represent an operation...
...that we want to be an atomic one...


Commits a transaction.


Used to introduce a mark in the sequence of queries up to which queries will be committed and after which queries might be reverted/rolled back.

... goes a set of queries...
...that might be committed...
SAVEPOINT my_savepoint; goes a set of queries...
...that might be rolled back...
ROLLBACK TO my_savepoint; goes a set of queries...
...that might be committed...


Rolls back the transaction.


To find out the path to Postgres config file use:

jsonschema2db-test=# SHOW config_file;
(1 row)

Now we can exit psql terminal and display config file:

jsonschema2db-test-# \q

bash-4.3$ cat /home/app/postgres/postgresql.conf
# -----------------------------
# PostgreSQL configuration file
# -----------------------------
# This file consists of lines of the form:
#   name = value
# (The "=" is optional.)  Whitespace may be used.  Comments are introduced with
# "#" anywhere on a line.  The complete list of parameter names and allowed
# values can be found in the PostgreSQL documentation.
# The commented-out settings shown in this file represent the default values.
# Re-commenting a setting is NOT sufficient to revert it to the default value;
# you need to reload the server.
# This file is read on server startup and when the server receives a SIGHUP
# signal.  If you edit the file on a running system, you have to SIGHUP the
# server for the changes to take effect, or use "pg_ctl reload".  Some
# parameters, which are marked below, require a server shutdown and restart to
# take effect.
# Any parameter can also be given as a command-line option to the server, e.g.,
# "postgres -c log_connections=on".  Some parameters can be changed at run time
# with the "SET" SQL command.
# Memory units:  kB = kilobytes        Time units:  ms  = milliseconds
#                MB = megabytes                     s   = seconds
#                GB = gigabytes                     min = minutes
#                TB = terabytes                     h   = hours
#                                                   d   = days


# The default values of these variables are driven from the -D command-line
# option or PGDATA environment variable, represented here as ConfigDir.

#data_directory = 'ConfigDir'           # use data in another directory
                                        # (change requires restart)
#hba_file = 'ConfigDir/pg_hba.conf'     # host-based authentication file
                                        # (change requires restart)
#ident_file = 'ConfigDir/pg_ident.conf' # ident configuration file
                                        # (change requires restart)

# If external_pid_file is not explicitly set, no extra PID file is written.
#external_pid_file = ''                 # write an extra PID file
                                        # (change requires restart)


# - Connection Settings -

listen_addresses = '*'
                                        # comma-separated list of addresses;
                                        # defaults to 'localhost'; use '*' for all
                                        # (change requires restart)
#port = 5432                            # (change requires restart)
max_connections = 100                   # (change requires restart)
#superuser_reserved_connections = 3     # (change requires restart)
#unix_socket_directories = '/var/run/postgresql'        # comma-separated list of directories
                                        # (change requires restart)
#unix_socket_group = ''                 # (change requires restart)
#unix_socket_permissions = 0777         # begin with 0 to use octal notation
                                        # (change requires restart)
#bonjour = off                          # advertise server via Bonjour
                                        # (change requires restart)
#bonjour_name = ''                      # defaults to the computer name
                                        # (change requires restart)

# - Security and Authentication -

#authentication_timeout = 1min          # 1s-600s
#ssl = off                              # (change requires restart)
#ssl_ciphers = 'HIGH:MEDIUM:+3DES:!aNULL' # allowed SSL ciphers
                                        # (change requires restart)
#ssl_prefer_server_ciphers = on         # (change requires restart)
#ssl_ecdh_curve = 'prime256v1'          # (change requires restart)
#ssl_cert_file = 'server.crt'           # (change requires restart)
#ssl_key_file = 'server.key'            # (change requires restart)
#ssl_ca_file = ''                       # (change requires restart)
#ssl_crl_file = ''                      # (change requires restart)
#password_encryption = on
#db_user_namespace = off
#row_security = on

# GSSAPI using Kerberos
#krb_server_keyfile = ''
#krb_caseins_users = off

# - TCP Keepalives -
# see "man 7 tcp" for details

#tcp_keepalives_idle = 0                # TCP_KEEPIDLE, in seconds;
                                        # 0 selects the system default
#tcp_keepalives_interval = 0            # TCP_KEEPINTVL, in seconds;
                                        # 0 selects the system default
#tcp_keepalives_count = 0               # TCP_KEEPCNT;
                                        # 0 selects the system default


# - Memory -

shared_buffers = 128MB                  # min 128kB
                                        # (change requires restart)
#huge_pages = try                       # on, off, or try
                                        # (change requires restart)
#temp_buffers = 8MB                     # min 800kB
#max_prepared_transactions = 0          # zero disables the feature
                                        # (change requires restart)
# Caution: it is not advisable to set max_prepared_transactions nonzero unless
# you actively intend to use prepared transactions.
#work_mem = 4MB                         # min 64kB
#maintenance_work_mem = 64MB            # min 1MB
#replacement_sort_tuples = 150000       # limits use of replacement selection sort
#autovacuum_work_mem = -1               # min 1MB, or -1 to use maintenance_work_mem
#max_stack_depth = 2MB                  # min 100kB
dynamic_shared_memory_type = posix      # the default is the first option
                                        # supported by the operating system:
                                        #   posix
                                        #   sysv
                                        #   windows
                                        #   mmap
                                        # use none to disable dynamic shared memory

# - Disk -

#temp_file_limit = -1                   # limits per-process temp file space
                                        # in kB, or -1 for no limit

# - Kernel Resource Usage -

#max_files_per_process = 1000           # min 25
                                        # (change requires restart)
#shared_preload_libraries = ''          # (change requires restart)

# - Cost-Based Vacuum Delay -

#vacuum_cost_delay = 0                  # 0-100 milliseconds
#vacuum_cost_page_hit = 1               # 0-10000 credits
#vacuum_cost_page_miss = 10             # 0-10000 credits
#vacuum_cost_page_dirty = 20            # 0-10000 credits
#vacuum_cost_limit = 200                # 1-10000 credits

# - Background Writer -

#bgwriter_delay = 200ms                 # 10-10000ms between rounds
#bgwriter_lru_maxpages = 100            # 0-1000 max buffers written/round
#bgwriter_lru_multiplier = 2.0          # 0-10.0 multiplier on buffers scanned/round
#bgwriter_flush_after = 512kB           # measured in pages, 0 disables

# - Asynchronous Behavior -

#effective_io_concurrency = 1           # 1-1000; 0 disables prefetching
#max_worker_processes = 8               # (change requires restart)
#max_parallel_workers_per_gather = 0    # taken from max_worker_processes
#old_snapshot_threshold = -1            # 1min-60d; -1 disables; 0 is immediate
                                        # (change requires restart)
#backend_flush_after = 0                # measured in pages, 0 disables


# - Settings -

#wal_level = minimal                    # minimal, replica, or logical
                                        # (change requires restart)
#fsync = on                             # flush data to disk for crash safety
                                                # (turning this off can cause
                                                # unrecoverable data corruption)
#synchronous_commit = on                # synchronization level;
                                        # off, local, remote_write, remote_apply, or on
#wal_sync_method = fsync                # the default is the first option
                                        # supported by the operating system:
                                        #   open_datasync
                                        #   fdatasync (default on Linux)
                                        #   fsync
                                        #   fsync_writethrough
                                        #   open_sync
#full_page_writes = on                  # recover from partial page writes
#wal_compression = off                  # enable compression of full-page writes
#wal_log_hints = off                    # also do full page writes of non-critical updates
                                        # (change requires restart)
#wal_buffers = -1                       # min 32kB, -1 sets based on shared_buffers
                                        # (change requires restart)
#wal_writer_delay = 200ms               # 1-10000 milliseconds
#wal_writer_flush_after = 1MB           # measured in pages, 0 disables

#commit_delay = 0                       # range 0-100000, in microseconds
#commit_siblings = 5                    # range 1-1000

# - Checkpoints -

#checkpoint_timeout = 5min              # range 30s-1d
#max_wal_size = 1GB
#min_wal_size = 80MB
#checkpoint_completion_target = 0.5     # checkpoint target duration, 0.0 - 1.0
#checkpoint_flush_after = 256kB         # measured in pages, 0 disables
#checkpoint_warning = 30s               # 0 disables

# - Archiving -

#archive_mode = off             # enables archiving; off, on, or always
                                # (change requires restart)
#archive_command = ''           # command to use to archive a logfile segment
                                # placeholders: %p = path of file to archive
                                #               %f = file name only
                                # e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'
#archive_timeout = 0            # force a logfile segment switch after this
                                # number of seconds; 0 disables


# - Sending Server(s) -

# Set these on the master and on any standby that will send replication data.

#max_wal_senders = 0            # max number of walsender processes
                                # (change requires restart)
#wal_keep_segments = 0          # in logfile segments, 16MB each; 0 disables
#wal_sender_timeout = 60s       # in milliseconds; 0 disables

#max_replication_slots = 0      # max number of replication slots
                                # (change requires restart)
#track_commit_timestamp = off   # collect timestamp of transaction commit
                                # (change requires restart)

# - Master Server -

# These settings are ignored on a standby server.

#synchronous_standby_names = '' # standby servers that provide sync rep
                                # number of sync standbys and comma-separated list of application_name
                                # from standby(s); '*' = all
#vacuum_defer_cleanup_age = 0   # number of xacts by which cleanup is delayed

# - Standby Servers -

# These settings are ignored on a master server.

#hot_standby = off                      # "on" allows queries during recovery
                                        # (change requires restart)
#max_standby_archive_delay = 30s        # max delay before canceling queries
                                        # when reading WAL from archive;
                                        # -1 allows indefinite delay
#max_standby_streaming_delay = 30s      # max delay before canceling queries
                                        # when reading streaming WAL;
                                        # -1 allows indefinite delay
#wal_receiver_status_interval = 10s     # send replies at least this often
                                        # 0 disables
#hot_standby_feedback = off             # send info from standby to prevent
                                        # query conflicts
#wal_receiver_timeout = 60s             # time that receiver waits for
                                        # communication from master
                                        # in milliseconds; 0 disables
#wal_retrieve_retry_interval = 5s       # time to wait before retrying to
                                        # retrieve WAL after a failed attempt


# - Planner Method Configuration -

#enable_bitmapscan = on
#enable_hashagg = on
#enable_hashjoin = on
#enable_indexscan = on
#enable_indexonlyscan = on
#enable_material = on
#enable_mergejoin = on
#enable_nestloop = on
#enable_seqscan = on
#enable_sort = on
#enable_tidscan = on

# - Planner Cost Constants -

#seq_page_cost = 1.0                    # measured on an arbitrary scale
#random_page_cost = 4.0                 # same scale as above
#cpu_tuple_cost = 0.01                  # same scale as above
#cpu_index_tuple_cost = 0.005           # same scale as above
#cpu_operator_cost = 0.0025             # same scale as above
#parallel_tuple_cost = 0.1              # same scale as above
#parallel_setup_cost = 1000.0   # same scale as above
#min_parallel_relation_size = 8MB
#effective_cache_size = 4GB

# - Genetic Query Optimizer -

#geqo = on
#geqo_threshold = 12
#geqo_effort = 5                        # range 1-10
#geqo_pool_size = 0                     # selects default based on effort
#geqo_generations = 0                   # selects default based on effort
#geqo_selection_bias = 2.0              # range 1.5-2.0
#geqo_seed = 0.0                        # range 0.0-1.0

# - Other Planner Options -

#default_statistics_target = 100        # range 1-10000
#constraint_exclusion = partition       # on, off, or partition
#cursor_tuple_fraction = 0.1            # range 0.0-1.0
#from_collapse_limit = 8
#join_collapse_limit = 8                # 1 disables collapsing of explicit
                                        # JOIN clauses
#force_parallel_mode = off


# - Where to Log -

#log_destination = 'stderr'             # Valid values are combinations of
                                        # stderr, csvlog, syslog, and eventlog,
                                        # depending on platform.  csvlog
                                        # requires logging_collector to be on.

# This is used when logging to stderr:
#logging_collector = off                # Enable capturing of stderr and csvlog
                                        # into log files. Required to be on for
                                        # csvlogs.
                                        # (change requires restart)

# These are only used if logging_collector is on:
#log_directory = 'pg_log'               # directory where log files are written,
                                        # can be absolute or relative to PGDATA
#log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'        # log file name pattern,
                                        # can include strftime() escapes
#log_file_mode = 0600                   # creation mode for log files,
                                        # begin with 0 to use octal notation
#log_truncate_on_rotation = off         # If on, an existing log file with the
                                        # same name as the new log file will be
                                        # truncated rather than appended to.
                                        # But such truncation only occurs on
                                        # time-driven rotation, not on restarts
                                        # or size-driven rotation.  Default is
                                        # off, meaning append to existing files
                                        # in all cases.
#log_rotation_age = 1d                  # Automatic rotation of logfiles will
                                        # happen after that time.  0 disables.
#log_rotation_size = 10MB               # Automatic rotation of logfiles will
                                        # happen after that much log output.
                                        # 0 disables.

# These are relevant when logging to syslog:
#syslog_facility = 'LOCAL0'
#syslog_ident = 'postgres'
#syslog_sequence_numbers = on
#syslog_split_messages = on

# This is only relevant when logging to eventlog (win32):
#event_source = 'PostgreSQL'

# - When to Log -

#client_min_messages = notice           # values in order of decreasing detail:
                                        #   debug5
                                        #   debug4
                                        #   debug3
                                        #   debug2
                                        #   debug1
                                        #   log
                                        #   notice
                                        #   warning
                                        #   error

#log_min_messages = warning             # values in order of decreasing detail:
                                        #   debug5
                                        #   debug4
                                        #   debug3
                                        #   debug2
                                        #   debug1
                                        #   info
                                        #   notice
                                        #   warning
                                        #   error
                                        #   log
                                        #   fatal
                                        #   panic

#log_min_error_statement = error        # values in order of decreasing detail:
                                        #   debug5
                                        #   debug4
                                        #   debug3
                                        #   debug2
                                        #   debug1
                                        #   info
                                        #   notice
                                        #   warning
                                        #   error
                                        #   log
                                        #   fatal
                                        #   panic (effectively off)

#log_min_duration_statement = -1        # -1 is disabled, 0 logs all statements
                                        # and their durations, > 0 logs only
                                        # statements running at least this number
                                        # of milliseconds

# - What to Log -

#debug_print_parse = off
#debug_print_rewritten = off
#debug_print_plan = off
#debug_pretty_print = on
#log_checkpoints = off
#log_connections = off
#log_disconnections = off
#log_duration = off
#log_error_verbosity = default          # terse, default, or verbose messages
#log_hostname = off
#log_line_prefix = ''                   # special values:
                                        #   %a = application name
                                        #   %u = user name
                                        #   %d = database name
                                        #   %r = remote host and port
                                        #   %h = remote host
                                        #   %p = process ID
                                        #   %t = timestamp without milliseconds
                                        #   %m = timestamp with milliseconds
                                        #   %n = timestamp with milliseconds (as a Unix epoch)
                                        #   %i = command tag
                                        #   %e = SQL state
                                        #   %c = session ID
                                        #   %l = session line number
                                        #   %s = session start timestamp
                                        #   %v = virtual transaction ID
                                        #   %x = transaction ID (0 if none)
                                        #   %q = stop here in non-session
                                        #        processes
                                        #   %% = '%'
                                        # e.g. '<%u%%%d> '
#log_lock_waits = off                   # log lock waits >= deadlock_timeout
#log_statement = 'none'                 # none, ddl, mod, all
#log_replication_commands = off
#log_temp_files = -1                    # log temporary files equal or larger
                                        # than the specified size in kilobytes;
                                        # -1 disables, 0 logs all temp files
log_timezone = 'UTC'

# - Process Title -

#cluster_name = ''                      # added to process titles if nonempty
                                        # (change requires restart)
#update_process_title = on


# - Query/Index Statistics Collector -

#track_activities = on
#track_counts = on
#track_io_timing = off
#track_functions = none                 # none, pl, all
#track_activity_query_size = 1024       # (change requires restart)
#stats_temp_directory = 'pg_stat_tmp'

# - Statistics Monitoring -

#log_parser_stats = off
#log_planner_stats = off
#log_executor_stats = off
#log_statement_stats = off


#autovacuum = on                        # Enable autovacuum subprocess?  'on'
                                        # requires track_counts to also be on.
#log_autovacuum_min_duration = -1       # -1 disables, 0 logs all actions and
                                        # their durations, > 0 logs only
                                        # actions running at least this number
                                        # of milliseconds.
#autovacuum_max_workers = 3             # max number of autovacuum subprocesses
                                        # (change requires restart)
#autovacuum_naptime = 1min              # time between autovacuum runs
#autovacuum_vacuum_threshold = 50       # min number of row updates before
                                        # vacuum
#autovacuum_analyze_threshold = 50      # min number of row updates before
                                        # analyze
#autovacuum_vacuum_scale_factor = 0.2   # fraction of table size before vacuum
#autovacuum_analyze_scale_factor = 0.1  # fraction of table size before analyze
#autovacuum_freeze_max_age = 200000000  # maximum XID age before forced vacuum
                                        # (change requires restart)
#autovacuum_multixact_freeze_max_age = 400000000        # maximum multixact age
                                        # before forced vacuum
                                        # (change requires restart)
#autovacuum_vacuum_cost_delay = 20ms    # default vacuum cost delay for
                                        # autovacuum, in milliseconds;
                                        # -1 means use vacuum_cost_delay
#autovacuum_vacuum_cost_limit = -1      # default vacuum cost limit for
                                        # autovacuum, -1 means use
                                        # vacuum_cost_limit


# - Statement Behavior -

#search_path = '"$user", public'        # schema names
#default_tablespace = ''                # a tablespace name, '' uses the default
#temp_tablespaces = ''                  # a list of tablespace names, '' uses
                                        # only default tablespace
#check_function_bodies = on
#default_transaction_isolation = 'read committed'
#default_transaction_read_only = off
#default_transaction_deferrable = off
#session_replication_role = 'origin'
#statement_timeout = 0                  # in milliseconds, 0 is disabled
#lock_timeout = 0                       # in milliseconds, 0 is disabled
#idle_in_transaction_session_timeout = 0                # in milliseconds, 0 is disabled
#vacuum_freeze_min_age = 50000000
#vacuum_freeze_table_age = 150000000
#vacuum_multixact_freeze_min_age = 5000000
#vacuum_multixact_freeze_table_age = 150000000
#bytea_output = 'hex'                   # hex, escape
#xmlbinary = 'base64'
#xmloption = 'content'
#gin_fuzzy_search_limit = 0
#gin_pending_list_limit = 4MB

# - Locale and Formatting -

datestyle = 'iso, mdy'
#intervalstyle = 'postgres'
timezone = 'UTC'
#timezone_abbreviations = 'Default'     # Select the set of available time zone
                                        # abbreviations.  Currently, there are
                                        #   Default
                                        #   Australia (historical usage)
                                        #   India
                                        # You can create your own file in
                                        # share/timezonesets/.
#extra_float_digits = 0                 # min -15, max 3
#client_encoding = sql_ascii            # actually, defaults to database
                                        # encoding

# These settings are initialized by initdb, but they can be changed.
lc_messages = 'en_US.utf-8'                     # locale for system error message
                                        # strings
lc_monetary = 'en_US.utf-8'                     # locale for monetary formatting
lc_numeric = 'en_US.utf-8'                      # locale for number formatting
lc_time = 'en_US.utf-8'                         # locale for time formatting

# default configuration for text search
default_text_search_config = 'pg_catalog.english'

# - Other Defaults -

#dynamic_library_path = '$libdir'
#local_preload_libraries = ''
#session_preload_libraries = ''


#deadlock_timeout = 1s
#max_locks_per_transaction = 64         # min 10
                                        # (change requires restart)
#max_pred_locks_per_transaction = 64    # min 10
                                        # (change requires restart)


# - Previous PostgreSQL Versions -

#array_nulls = on
#backslash_quote = safe_encoding        # on, off, or safe_encoding
#default_with_oids = off
#escape_string_warning = on
#lo_compat_privileges = off
#operator_precedence_warning = off
#quote_all_identifiers = off
#sql_inheritance = on
#standard_conforming_strings = on
#synchronize_seqscans = on

# - Other Platforms and Clients -

#transform_null_equals = off


#exit_on_error = off                    # terminate session on any error?
#restart_after_crash = on               # reinitialize after backend crash?


# These options allow settings to be loaded from files other than the
# default postgresql.conf.

#include_dir = 'conf.d'                 # include files ending in '.conf' from
                                        # directory 'conf.d'
#include_if_exists = 'exists.conf'      # include file only if it exists
#include = 'special.conf'               # include file


# Add settings for extensions here

To debug errors in Postgres client apps it is useful to check Postgres logs:

$ tail -10 /usr/local/var/log/postgres.log


$ tail -10 /var/log/postgresql/postgresql-M.m.main.log

(replace M.m with the actual version of postgres instance).


$ psql --help
psql is the PostgreSQL interactive terminal.


General options:
  -c, --command=COMMAND    run only single command (SQL or internal) and exit

  -d, --dbname=DBNAME      database name to connect to (default: "dbuser") // this is the name returned by whoami command in shell

  -f, --file=FILENAME      execute commands from file, then exit

  -l, --list               list available databases, then exit

  -v, --set=, --variable=NAME=VALUE
                           set psql variable NAME to VALUE
                           (e.g., -v ON_ERROR_STOP=1)
  -V, --version            output version information, then exit
  -X, --no-psqlrc          do not read startup file (~/.psqlrc)
  -1 ("one"), --single-transaction
                           execute as a single transaction (if non-interactive)
  -?, --help[=options]     show this help, then exit
      --help=commands      list backslash commands, then exit
      --help=variables     list special variables, then exit

Input and output options:
  -a, --echo-all           echo all input from script
  -b, --echo-errors        echo failed commands
  -e, --echo-queries       echo commands sent to server

  -E, --echo-hidden        display queries that internal commands generate // If you start psql with the -E flag, it will display the real query when you use a meta-command.

  -L, --log-file=FILENAME  send session log to file
  -n, --no-readline        disable enhanced command line editing (readline)
  -o, --output=FILENAME    send query results to file (or |pipe)
  -q, --quiet              run quietly (no messages, only query output)
  -s, --single-step        single-step mode (confirm each query)
  -S, --single-line        single-line mode (end of line terminates SQL command)

Output format options:
  -A, --no-align           unaligned table output mode
  -F, --field-separator=STRING
                           field separator for unaligned output (default: "|")
  -H, --html               HTML table output mode
  -P, --pset=VAR[=ARG]     set printing option VAR to ARG (see \pset command)
  -R, --record-separator=STRING
                           record separator for unaligned output (default: newline)
  -t, --tuples-only        print rows only
  -T, --table-attr=TEXT    set HTML table tag attributes (e.g., width, border)
  -x, --expanded           turn on expanded table output
  -z, --field-separator-zero
                           set field separator for unaligned output to zero byte
  -0, --record-separator-zero
                           set record separator for unaligned output to zero byte

Connection options:
  -h, --host=HOSTNAME      database server host or socket directory (default: "local socket")
  -p, --port=PORT          database server port (default: "5432")

  -U, --username=USERNAME  database user name (default: "dbuser")

  -w, --no-password        never prompt for password
  -W, --password           force password prompt (should happen automatically)

For more information, type "\?" (for internal commands) or "\help" (for SQL
commands) from within psql, or consult the psql section in the PostgreSQL

Report bugs to <>.

$ psql
psql: FATAL:  database "dbuser" does not exist

psql takes the name of the process owner as the name of the database that it's supposed to connect. As database dbuser does not exist, psql prints that error above.

$ psql --version
psql (PostgreSQL) 11.2

To list all regular databases:

psql -l
                                   List of databases
        Name        | Owner  | Encoding |   Collate   |    Ctype    | Access privileges 
 jsonschema2db-test | dbuser | UTF8     | en_US.utf-8 | en_US.utf-8 | 
 postgres           | dbuser | UTF8     | en_US.utf-8 | en_US.utf-8 | 
 template0          | dbuser | UTF8     | en_US.utf-8 | en_US.utf-8 | =c/dbuser        +
                    |        |          |             |             | dbuser=CTc/dbuser
 template1          | dbuser | UTF8     | en_US.utf-8 | en_US.utf-8 | =c/dbuser        +
                    |        |          |             |             | dbuser=CTc/dbuser
(4 rows)

Now we can connect to some database:

$ psql postgres
psql (11.2)
Type "help" for help.

To disconnect from it and connect to some other database use

\connect db_name


\c db_name

We can now create a new database:

postgres=# create database testdb;

To list databases after getting connected to some DB, just issue commands:

postgres=# \l
                                   List of databases
        Name        | Owner  | Encoding |   Collate   |    Ctype    | Access privileges 
 jsonschema2db-test | dbuser | UTF8     | en_US.utf-8 | en_US.utf-8 | 
 postgres           | dbuser | UTF8     | en_US.utf-8 | en_US.utf-8 | 
 template0          | dbuser | UTF8     | en_US.utf-8 | en_US.utf-8 | =c/dbuser        +
                    |        |          |             |             | dbuser=CTc/dbuser
 template1          | dbuser | UTF8     | en_US.utf-8 | en_US.utf-8 | =c/dbuser        +
                    |        |          |             |             | dbuser=CTc/dbuser
 testdb             | dbuser | UTF8     | en_US.utf-8 | en_US.utf-8 | 
(5 rows)

To see extended info on existing databases:

testdb=# \l+
                                                                     List of databases
        Name        | Owner  | Encoding |   Collate   |    Ctype    | Access privileges |  Size   | Tablespace |                Description                 
 jsonschema2db-test | dbuser | UTF8     | en_US.utf-8 | en_US.utf-8 |                   | 7763 kB | pg_default | 
 postgres           | dbuser | UTF8     | en_US.utf-8 | en_US.utf-8 |                   | 7699 kB | pg_default | default administrative connection database
 template0          | dbuser | UTF8     | en_US.utf-8 | en_US.utf-8 | =c/dbuser        +| 7561 kB | pg_default | unmodifiable empty database
                    |        |          |             |             | dbuser=CTc/dbuser |         |            | 
 template1          | dbuser | UTF8     | en_US.utf-8 | en_US.utf-8 | =c/dbuser        +| 7561 kB | pg_default | default template for new databases
                    |        |          |             |             | dbuser=CTc/dbuser |         |            | 
 testdb             | dbuser | UTF8     | en_US.utf-8 | en_US.utf-8 |                   | 7699 kB | pg_default | 
(5 rows)

Schemas are namespaces: you may have different tables with same name in different namespaces.

To list all tables for public schema use


To list all tables in all schemas:

\dt *.* 

To list tables in a particular schema use:

\dt schema_name.*

List tables in a PostgreSQL schema

$ psql jsonschema2db-test
psql (11.2)
Type "help" for help.


sonschema2db-test-# \du
                                   List of roles
 Role name |                         Attributes                         | Member of
 dbuser    | Superuser, Create role, Create DB, Replication, Bypass RLS | {}

To list all constrains in some table (with psql):

my_db=# \d+ schema_name.table_name;


my_db=# \d+ my_schema.my_table;
                                                                              Table "my_schema.my_table"
                      Column                       |       Type       | Collation | Nullable |                      Default                      | Storage  | Stats target | Description 
 id                                                | integer          |           | not null | nextval('block_filterlist.root_id_seq'::regclass) | plain    |              | 
 item_id                                           | bigint           |           | not null |                                                   | plain    |              | 
 prefix                                            | text             |           | not null |                                                   | extended |              | 
 attachments                                       | text             |           |          |                                                   | 


extended |              | 
 type                                              | text             |           |          |                                                   | extended |              | 
 url                                               | text             |           |          |                                                   | extended |              | 
    "root_id_key" UNIQUE CONSTRAINT, btree (id)
    "root_item_id_prefix_key" UNIQUE CONSTRAINT, btree (item_id, prefix)

If you try to insert two rows with the same value for item_id Postgres will issue an error:

ERROR:  duplicate key value violates unique constraint "root_item_id_prefix_key"
DETAIL:  Key (item_id, prefix)=(0, 0) already exists.
STATEMENT:  insert into my_schema.my_table (item_id, post_id, prefix) values(0, 1234, 0) returning id

PostgreSQL Docker Image

If Dockerfile contains the command which starts the Postgres server:

CMD pg_ctl -w start

...then, when running docker image we get:

$ docker run jsonschema2db-demo
pg_ctl: cannot be run as root
Please log in (using, e.g., "su") as the (unprivileged) user that will
own the server process.

This is because 

“By default docker containers run as root. (…) As docker matures, more secure default options may become available. For now, requiring root is dangerous for others and may not be available in all environments. Your image should use the USER instruction to specify a non-root user for containers to run as”. (From Guidance for Docker Image Authors)

Quick fix:

RUN adduser --disabled-password dbuser
USER dbuser
CMD pg_ctl -w start



CMD pg_ctl -w start

might output the following error:

pg_ctl: directory "/var/lib/postgresql/data" is not a database cluster directory

You're getting this error because there is no database cluster created inside the postgres docker image when you're attempting to run the pg_ctl start command.

The database cluster is created when you run a docker container based on the image, as the initdb binary is called as part of the script that is set as the ENTRYPOINT for the postgres container.


RUN initdb


Execution of

RUN initdb

might output the following eror:

initdb: could not change permissions of directory "/var/lib/postgresql/data": Operation not permitted
The files belonging to this database system will be owned by user "dbuser".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/data ... The command '/bin/sh -c initdb' returned a non-zero code: 1

From Running PostgreSQL using Docker Compose:

Docker volumes are the recommended way to persist data. These are file systems managed by the Docker daemon and more often than not you are expected to create one and mount it inside your container when you launch it. The Postgres official image, however, comes with a VOLUME predefined in its image description.

This means that when you run a PostgreSQL image as a container, it creates a volume for itself and stores data in there.

Let's inspect the docker container mypostgres to see volumes mounted inside the database container:

$ docker inspect mypostgres

            "Env": [
            "Volumes": {
                "/var/lib/postgresql/data": {}
            "Entrypoint": [

From Postgres docs:

Traditionally, the configuration and data files used by a database cluster are stored together within the cluster's data directory, commonly referred to as PGDATA (after the name of the environment variable that can be used to define it). A common location for PGDATA is /var/lib/pgsql/data.

The cause of the error above is that root is the owner of /var/lib/postgresql/data and initdb has to be run as non-root so it can't access it to initialize DB.


Set some other directory, owned by non-root user (or the same user that runs initdb), as PGDATA value:

USER dbuser
ENV PGDATA /home/dbuser/pgdata
RUN initdb



RUN initdb

...might output the following:

 ---> Running in 7175f5b08c6b
The files belonging to this database system will be owned by user "dbuser".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".

The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory /home/dbuser/pgdata ... ok

creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... sh: locale: not found
2019-04-25 11:51:46.011 UTC [10] WARNING:  no usable system locales were found
syncing data to disk ... ok

Success. You can now start the database server using:

    pg_ctl -D /home/dbuser/pgdata -l logfile start

To fix this, define and set LC_ALL environment variable:

USER dbuser
ENV PGDATA /home/dbuser/pgdata
ENV LC_ALL=en_US.utf-8
RUN initdb


When executing

CMD pg_ctl -w start && createdb jsonschema2db-test && python3

Postgres container will exit as soon as all commands are executed. If we want to make this container running until we stop it, we can add launching bash:

CMD pg_ctl -w start && createdb jsonschema2db-test && python3 && /bin/bash

and run this container in the background:

$ docker run -dt mydb

Use docker ps to verify this.


How to query Postgres running in a container

First open a bash in it:

$ docker exec -it c5158bf4cb0c bash

Then run in bash psql terminal:

$ psql

Example (DB_USER=postgres, DB_PASSWORD=postgres; bash user is root):

$ docker exec -it my_app_db_1 bash

root@4c0cce87ba63:/# psql -l
psql: FATAL:  role "root" does not exist

root@4c0cce87ba63:/# psql
psql: FATAL:  role "root" does not exist

root@4c0cce87ba63:/# psql -l user=postgres
                                 List of databases
   Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges   
 postgres  | postgres | UTF8     | en_US.utf8 | en_US.utf8 | 
 template0 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
 my_app_dev    | postgres | UTF8     | en_US.utf8 | en_US.utf8 | 
(4 rows)

root@4c0cce87ba63:/# psql my_app_dev
psql: FATAL:  role "root" does not exist

root@4c0cce87ba63:/# psql wp_dev postgres
psql (11.3 (Debian 11.3-1.pgdg90+1))
Type "help" for help.

my_app_dev=# \dn
       List of schemas
       Name       |  Owner   
my_schema | postgres
 public           | postgres
 test             | postgres
(3 rows)

my_app_dev=# select * from my_schema.root;
 id | item_id | prefix | attachments |....status | tags | title | title_plain | type | url 
(0 rows)



How to execute some SQL queries (e.g. create a new database) upon launching Postgres in Docker container?

We usually want to build the database structure so we can have init.sql (and other scripts) which contains all the CREATE TABLE statements. This file has to be mapped into docker-entrypoint-initdb.d where Postgres picks it up and sql scripts are executed.

The following example shows how to create database on the Postgres launch. Note that init scripts will run only on the first run of the database initailization.

Data directory must be empty when you start the container (the named volume from your compose). The scripts are only run on first database creation; if you compose down/stop/kill and then up they will not run anything new. [source]




    image: postgres:latest
      - ./db/init.sql:/docker-entrypoint-initdb.d/init.sql

docker-compose up output:

2019-05-10 09:06:32.033 UTC [42] LOG:  database system is ready to accept connections
db_1         |  done
db_1         | server started
db_1         | /usr/local/bin/ running /docker-entrypoint-initdb.d/init.sql
db_1         | CREATE DATABASE


pgAdmin as a browser-based DB client. Once DB server is added and client is connected to and arbitrary DB, it is possible to get additional views simply by duplicating that browser tab. I wrote a post about how to run it in Docker container.