Thursday, 5 March 2026

How to install Clickhouse in AWS EKS cluster via Altinity Helm charts and Terraform


 
Installing ClickHouse on an AWS EKS cluster using Terraform and the Altinity Helm charts typically involves two stages:
  1. Installing the Altinity ClickHouse Operator
  2. Deploying a ClickHouse Installation (CHI)
The Altinity Helm repository is located at https://helm.altinity.com.


Prerequisites


Ensure your Terraform environment is configured with the following providers:
  • aws: To manage EKS and underlying infrastructure.
  • kubernetes: To interact with the EKS cluster.
  • helm: To install the operator.

Terraform Configuration


The following example uses the helm_release resource to install the operator and the kubernetes_manifest resource to deploy the actual ClickHouse cluster.

Step A: Install the Altinity Operator


The operator is the "brain" that manages ClickHouse instances on Kubernetes.


resource "helm_release" "clickhouse_operator" {
  name             = "clickhouse-operator"
  repository       = "https://helm.altinity.com"
  chart            = "altinity-clickhouse-operator"
  namespace        = "clickhouse-operator"
  create_namespace = true

  # Optional: Enable metrics for Prometheus
  set {
    name  = "metrics.enabled"
    value = "true"
  }
}


Step B: Deploy a ClickHouse Cluster (CHI)


Once the operator is running, you define your ClickHouse cluster using a Custom Resource (CRD). In Terraform, you use kubernetes_manifest.


resource "kubernetes_manifest" "clickhouse_cluster" {
  depends_on = [helm_release.clickhouse_operator]

  manifest = {
    apiVersion = "clickhouse.altinity.com/v1"
    kind       = "ClickHouseInstallation"
    metadata = {
      name      = "simple-clickhouse"
      namespace = "clickhouse-operator"
    }
    spec = {
      configuration = {
        clusters = [
          {
            name = "cluster1"
            layout = {
              shardsCount   = 1
              replicasCount = 1
            }
          }
        ]
      }
    }
  }
}


Production Considerations for EKS


When running ClickHouse on EKS, you should consider storage and networking:
  • Storage Class: Use AWS gp3 volumes for a good balance of price and performance. You can specify a volumeClaimTemplate in your kubernetes_manifest.
  • Node Affinity: It is recommended to run ClickHouse on specific node groups (e.g., using i3 or r5 instances) to ensure it doesn't compete with other workloads for IOPS.
  • Zookeeper/Keeper: For multi-node shards or replicas, you will need a Zookeeper cluster or the ClickHouse Keeper (also available via Altinity charts).

EKS Module


Altinity maintains a dedicated Terraform EKS ClickHouse module that automates the entire VPC, EKS, and ClickHouse setup if you prefer a pre-packaged solution.


How to view Clickhouse Installation Configuration?



% kubectl get chi -n clickhouse -o yaml                      
apiVersion: v1
items:
- apiVersion: clickhouse.altinity.com/v1
  kind: ClickHouseInstallation
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
 {"apiVersion":"clickhouse.altinity.com/v1","kind":"ClickHouseInstallation","metadata":....}
    creationTimestamp: "2025-01-28T14:35:46Z"
    finalizers:
    - finalizer.clickhouseinstallation.altinity.com
    generation: 12
    name: clickhouse
    namespace: clickhouse
    resourceVersion: "67251031"
    uid: 9fxxxx1-81e7-429b-9cf7-ffxxxxxxef
  spec:
    configuration:
      clusters:
      - layout:
          replicasCount: 1
          shardsCount: 1
        name: ch
        templates:
          dataVolumeClaimTemplate: ch-data
          podTemplate: ch-pod
          serviceTemplate: ch-svc
      users:
        admin/grants/query: GRANT ALL ON *.*
        admin/networks/ip: 0.0.0.0/0
        admin/password: my-admin-password
(or     admin/password_sha256_hex: my-admin-password-in-sha256)
        admin/profile: xxxx
        admin/quota: xxxxx
        admin/settings/enable_http_compression: 1
        default/k8s_secret_password_sha256_hex: <namespace/secretName/key>
        default/profile: default
        default/quota: default
    templates:
      podTemplates:
      - name: ch-pod
        spec:
          containers:
          - image: altinity/clickhouse-server:24.8.14.10544.altinitystable
            name: clickhouse
          - args:
            - server
            env:
            - name: LOG_LEVEL
              value: info
            - name: API_LISTEN
              value: 0.0.0.0:7171
            - name: API_CREATE_INTEGRATION_TABLES
              value: "true"
            - name: REMOTE_STORAGE
              value: s3
            - name: BACKUPS_TO_KEEP_REMOTE
              value: "2"
            - name: S3_BUCKET
              value: my-clickhouse-backups
            - name: S3_REGION
              value: us-east-1
            - name: CLICKHOUSE_HOST
              value: localhost
            - name: CLICKHOUSE_USERNAME
              value: xxxxx
            - name: CLICKHOUSE_PASSWORD
              value: xxxx
            image: altinity/clickhouse-backup:latest
            imagePullPolicy: IfNotPresent
            name: clickhouse-backup
          serviceAccountName: clickhouse-backup
          tolerations:
          - effect: NoSchedule
            key: karpenter/clickhouse
            operator: Exists
      serviceTemplates:
      - metadata:
          annotations:
            service.beta.kubernetes.io/aws-load-balancer-ip-address-type: ipv4
            service.beta.kubernetes.io/aws-load-balancer-name: my-clickhouse-nlb
            service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
            service.beta.kubernetes.io/aws-load-balancer-scheme: internal
            service.beta.kubernetes.io/aws-load-balancer-type: nlb
          name: clickhouse
        name: ch-svc
        spec:
          ports:
          - name: http
            port: 8123
            targetPort: 8123
          - name: native
            port: 9000
            targetPort: 9000
          type: LoadBalancer
      volumeClaimTemplates:
      - name: ch-data
        spec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 100Gi
  status:
    chop-commit: 9abcd12
    chop-date: 2025-01-24T08:40:12
    chop-ip: 10.x.x.x
    chop-version: 0.25.5
    clusters: 1
    endpoint: clickhouse-clickhouse.clickhouse.svc.cluster.local
    fqdns:
    - chi-clickhouse-ch-0-0.clickhouse.svc.cluster.local
    hosts: 1
    hostsWithTablesCreated:
    - chi-clickhouse-ch-0-0.clickhouse.svc.cluster.local
    pods:
    - chi-clickhouse-ch-0-0-0
    shards: 1
    status: Completed
    taskID: auto-1xxxxd2-5ba4-4c3a-9daa-baxxxxx850
    taskIDsCompleted:
    - auto-1fxxxxxd2-5ba4-4c3a-9daa-baxxxxxx50
      ...
    - auto-bbxxxx6-31e3-4a4c-b04b-e5xxxxxx91
    taskIDsStarted:
    - auto-31xxxxx37-492f-4109-b515-4axxxxxx6c8
      ... 
    - auto-b8xxxxx7-0396-41e0-b5d1-95xxxxd48
kind: List
metadata:
  resourceVersion: ""


Users section shows users config in form USER_NAME/ATTRIBUTE. In the example above we have two users: admin and default.

USER_NAME/password values is plain text password. This is very convenient for debugging (though usually a security "no-no" for production, especially if that's admin or default user!).

USER_NAME/password_sha256_hex is a SHA256 hashed password.

USER_NAME/k8s_secret_password_sha256_hex: <namespace/SECRET_NAME/KEY_NAME > shows that USER_NAME ClickHouse user is secured using a Kubernetes Secret. This maps the default user's password to a specific Kubernetes Secret. 

  • USER_NAME/k8s_secret_password_sha256_hex: Specifies that for the user named USER_NAME, the password should be read from a Kubernetes Secret as a SHA256 hex string.
  • <namespace/SECRET_NAME/KEY_NAME >: This is the reference to the secret itself, structured as namespace/SECRET_NAME/KEY_NAME.
  • Purpose: This allows for secure, GitOps-friendly password management, preventing plain-text passwords from appearing in Kubernetes manifests.
  • Implementation: The ClickHouse Operator reads this secret and places the hashed password into the users.xml file for the ClickHouse server. Operator reads the secret, hashes the password (if necessary), and writes it into a file called /etc/clickhouse-server/users.d/chop-generated-users.xml inside your ClickHouse pod. If you have External Secrets installed, this secret is likely being pulled from AWS Secrets Manager.
  • Alternative: You can also use k8s_secret_env_password_sha256_hex to load the password via an environment variable.

In the Altinity Operator, the syntax USER_NAME/k8s_secret_password_sha256_hex is a pointer. It tells the operator to look into a specific secret to find the password hash for the USER_NAME user.

To get the password:

% kubectl get secret <SECRET_NAME> \
    -n <NAMESPACE> \
    -o jsonpath="{.data.<KEY_NAME>}" \
    | base64 -d

NAMESPACE is usually clickhouse.


How to check Clickhouse health?


Since ClickHouse is running in our cluster, the best way to verify it's "working fine" is to move beyond just checking the Pod status and actually query the database engine itself.

Here is a 3-step approach to verify health, connectivity, and data integrity.

1. The "Internal" Health Check


The quickest way is to execute a command directly inside the pod using the clickhouse-client. This bypasses networking issues and tells you if the engine is responsive. Run this command:

kubectl exec -it chi-clickhouse-ch-0-0 \
-n clickhouse \
-- clickhouse-client --query "SELECT version(), uptime()"

chi-clickhouse-ch-0-0 is the name of the pod, it can also be like chi-clickhouse-ch-0-0-0.

If this returns data, it means ClickHouse is successfully reading from its system tables on the EBS volume.

What to look for: It should return the version string and the number of seconds the server has been up. If this fails, the DB engine itself is hung.

If you are using default user which has a password, or, default user was disabled, the output might show the error similar to this:

% kubectl exec -it chi-clickhouse-ch-0-0-0 -n clickhouse -- clickhouse-client --query "SELECT version(), uptime()"
Defaulted container "clickhouse" out of: clickhouse, clickhouse-backup
Code: 516. DB::Exception: Received from localhost:9000. DB::Exception: default: Authentication failed: password is incorrect, or there is no user with such name.

If you have installed ClickHouse and forgot password you can reset it in the configuration file.
The password for default user is typically located at /etc/clickhouse-server/users.d/default-password.xml
and deleting this file will reset the password.
See also /etc/clickhouse-server/users.xml on the server where ClickHouse is installed.

. (AUTHENTICATION_FAILED)

command terminated with exit code 4

Seeing an AUTHENTICATION_FAILED error instead of a Connection Refused error is actually a positive result for this check:
  • Networking works: Your kubectl exec reached the pod.
  • Process is alive: The ClickHouse server is running and actively rejecting bad logins.
  • Storage is mounted: ClickHouse can't check credentials if it can't read its config files from disk.

If we know the Clickhouse credentials, we can perform the health check:

% kubectl exec -it chi-clickhouse-ch-0-0-0 -n clickhouse -- clickhouse-client --user USER --password PASS --query "SELECT version(), uptime(), name FROM system.clusters"
Defaulted container "clickhouse" out of: clickhouse, clickhouse-backup
24.8.14.10544.altinitystable 501389 all-clusters
24.8.14.10544.altinitystable 501389 all-replicated
24.8.14.10544.altinitystable 501389 all-sharded
24.8.14.10544.altinitystable 501389 ch
24.8.14.10544.altinitystable 501389 default


The output above is exactly what we wanted to see. The database is responsive, healthy, and has an uptime of ~5.8 days (501,389 seconds). The version 24.8.14.10544.altinitystable indicates we are on a very recent, stable Altinity build.


2. Check Replication and Disk Health


Since you are using the Altinity Operator, ClickHouse is likely managing data across disks. You want to ensure the "System" tables report no errors. Run this to check if the disks are mounted and have space:

% kubectl exec -it chi-clickhouse-ch-0-0-0 \
-n clickhouse \
-- clickhouse-client --user USER --password PASS \
--query "SELECT name, path, formatReadableSize(free_space) AS free, formatReadableSize(total_space) AS total FROM system.disks"

Defaulted container "clickhouse" out of: clickhouse, clickhouse-backup
default /var/lib/clickhouse/ 89.60 GiB 95.80 GiB

If you have multiple replicas (e.g., a ch-0-1 pod), check for replication lag:

% kubectl exec -it chi-clickhouse-ch-0-0-0 \
-n clickhouse \
-- clickhouse-client --user USER --password PASS \
--query "SELECT type, last_exception, num_tries FROM system.replication_queue WHERE last_exception != ''"     

Defaulted container "clickhouse" out of: clickhouse, clickhouse-backup

Result: This should ideally be empty (or as above) . If you see exceptions here, your nodes aren't syncing correctly.


3. Verify the "Operator" View


The Altinity Operator provides a "Status" field in its Custom Resource that summarizes the health of the entire installation.

% kubectl get chi -n clickhouse

NAME         CLUSTERS   HOSTS   STATUS      HOSTS-COMPLETED   AGE
clickhouse   1          1       Completed                     123d

What to look for: Look for the STATUS column. It should say Completed. If it says InProgress or Error, the Operator is struggling to configure the cluster.

4. Check the Backup (Safety Net)

Since you saw clickhouse-backup pods earlier, verify that the last backup actually succeeded. This is your "point of no return" check before the upgrade.

kubectl logs -n clickhouse -l job-name=clickhouse-backup-cron-<TIMESTAMP> 

(Replace <TIMESTAMP> with one of the strings from your previous get all output, e.g., 29543400).

Look for: Done, Success, or Upload finished.

5. Check the Status of All Replicas 


To be absolutely sure the cluster is "Green" before you start the EKS upgrade, run this to check the status of all replicas in the cluster:

% kubectl exec -it chi-clickhouse-ch-0-0-0 -n clickhouse -- clickhouse-client --user USER --password PASS --query "SELECT replica_path, is_leader, is_readonly, future_parts FROM system.replicas"

is_readonly: Should be 0. If it's 1, the node can't write data (usually a Zookeeper issue).
is_leader: One of your replicas should be 1.


Summary Checklist


Test      Command Goal     Good Result
Ping      SELECT  1        1
Uptime    SELECT uptime()  >0
Storage   system.disks     Free space > 10%
Operator  kubectl get chi  Completed


Resources:



No comments: