Installing ClickHouse on an AWS EKS cluster using Terraform and the Altinity Helm charts typically involves two stages:
- Installing the Altinity ClickHouse Operator
- Deploying a ClickHouse Installation (CHI)
The Altinity Helm repository is located at https://helm.altinity.com.
Prerequisites
Ensure your Terraform environment is configured with the following providers:
- aws: To manage EKS and underlying infrastructure.
- kubernetes: To interact with the EKS cluster.
- helm: To install the operator.
Terraform Configuration
The following example uses the helm_release resource to install the operator and the kubernetes_manifest resource to deploy the actual ClickHouse cluster.
Step A: Install the Altinity Operator
The operator is the "brain" that manages ClickHouse instances on Kubernetes.
resource "helm_release" "clickhouse_operator" {
name = "clickhouse-operator"
repository = "https://helm.altinity.com"
chart = "altinity-clickhouse-operator"
namespace = "clickhouse-operator"
create_namespace = true
# Optional: Enable metrics for Prometheus
set {
name = "metrics.enabled"
value = "true"
}
}
Step B: Deploy a ClickHouse Cluster (CHI)
Once the operator is running, you define your ClickHouse cluster using a Custom Resource (CRD). In Terraform, you use kubernetes_manifest.
resource "kubernetes_manifest" "clickhouse_cluster" {
depends_on = [helm_release.clickhouse_operator]
manifest = {
apiVersion = "clickhouse.altinity.com/v1"
kind = "ClickHouseInstallation"
metadata = {
name = "simple-clickhouse"
namespace = "clickhouse-operator"
}
spec = {
configuration = {
clusters = [
{
name = "cluster1"
layout = {
shardsCount = 1
replicasCount = 1
}
}
]
}
}
}
}
Production Considerations for EKS
When running ClickHouse on EKS, you should consider storage and networking:
- Storage Class: Use AWS gp3 volumes for a good balance of price and performance. You can specify a volumeClaimTemplate in your kubernetes_manifest.
- Node Affinity: It is recommended to run ClickHouse on specific node groups (e.g., using i3 or r5 instances) to ensure it doesn't compete with other workloads for IOPS.
- Zookeeper/Keeper: For multi-node shards or replicas, you will need a Zookeeper cluster or the ClickHouse Keeper (also available via Altinity charts).
EKS Module
Altinity maintains a dedicated Terraform EKS ClickHouse module that automates the entire VPC, EKS, and ClickHouse setup if you prefer a pre-packaged solution.
How to view Clickhouse Installation Configuration?
% kubectl get chi -n clickhouse -o yaml
apiVersion: v1
items:
- apiVersion: clickhouse.altinity.com/v1
kind: ClickHouseInstallation
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"clickhouse.altinity.com/v1","kind":"ClickHouseInstallation","metadata":....}
creationTimestamp: "2025-01-28T14:35:46Z"
finalizers:
- finalizer.clickhouseinstallation.altinity.com
generation: 12
name: clickhouse
namespace: clickhouse
resourceVersion: "67251031"
uid: 9fxxxx1-81e7-429b-9cf7-ffxxxxxxef
spec:
configuration:
clusters:
- layout:
replicasCount: 1
shardsCount: 1
name: ch
templates:
dataVolumeClaimTemplate: ch-data
podTemplate: ch-pod
serviceTemplate: ch-svc
users:
admin/grants/query: GRANT ALL ON *.*
admin/networks/ip: 0.0.0.0/0
admin/password: my-admin-password
(or admin/password_sha256_hex: my-admin-password-in-sha256)
admin/profile: xxxx
admin/quota: xxxxx
admin/settings/enable_http_compression: 1
default/k8s_secret_password_sha256_hex: <namespace/secretName/key>
default/profile: default
default/quota: default
templates:
podTemplates:
- name: ch-pod
spec:
containers:
- image: altinity/clickhouse-server:24.8.14.10544.altinitystable
name: clickhouse
- args:
- server
env:
- name: LOG_LEVEL
value: info
- name: API_LISTEN
value: 0.0.0.0:7171
- name: API_CREATE_INTEGRATION_TABLES
value: "true"
- name: REMOTE_STORAGE
value: s3
- name: BACKUPS_TO_KEEP_REMOTE
value: "2"
- name: S3_BUCKET
value: my-clickhouse-backups
- name: S3_REGION
value: us-east-1
- name: CLICKHOUSE_HOST
value: localhost
- name: CLICKHOUSE_USERNAME
value: xxxxx
- name: CLICKHOUSE_PASSWORD
value: xxxx
image: altinity/clickhouse-backup:latest
imagePullPolicy: IfNotPresent
name: clickhouse-backup
serviceAccountName: clickhouse-backup
tolerations:
- effect: NoSchedule
key: karpenter/clickhouse
operator: Exists
serviceTemplates:
- metadata:
annotations:
service.beta.kubernetes.io/aws-load-balancer-ip-address-type: ipv4
service.beta.kubernetes.io/aws-load-balancer-name: my-clickhouse-nlb
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internal
service.beta.kubernetes.io/aws-load-balancer-type: nlb
name: clickhouse
name: ch-svc
spec:
ports:
- name: http
port: 8123
targetPort: 8123
- name: native
port: 9000
targetPort: 9000
type: LoadBalancer
volumeClaimTemplates:
- name: ch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
status:
chop-commit: 9abcd12
chop-date: 2025-01-24T08:40:12
chop-ip: 10.x.x.x
chop-version: 0.25.5
clusters: 1
endpoint: clickhouse-clickhouse.clickhouse.svc.cluster.local
fqdns:
- chi-clickhouse-ch-0-0.clickhouse.svc.cluster.local
hosts: 1
hostsWithTablesCreated:
- chi-clickhouse-ch-0-0.clickhouse.svc.cluster.local
pods:
- chi-clickhouse-ch-0-0-0
shards: 1
status: Completed
taskID: auto-1xxxxd2-5ba4-4c3a-9daa-baxxxxx850
taskIDsCompleted:
- auto-1fxxxxxd2-5ba4-4c3a-9daa-baxxxxxx50
...
- auto-bbxxxx6-31e3-4a4c-b04b-e5xxxxxx91
taskIDsStarted:
- auto-31xxxxx37-492f-4109-b515-4axxxxxx6c8
...
- auto-b8xxxxx7-0396-41e0-b5d1-95xxxxd48
kind: List
metadata:
resourceVersion: ""
Users section shows users config in form USER_NAME/ATTRIBUTE. In the example above we have two users: admin and default.
USER_NAME/password values is plain text password. This is very convenient for debugging (though usually a security "no-no" for production, especially if that's admin or default user!).
USER_NAME/password_sha256_hex is a SHA256 hashed password.
USER_NAME/k8s_secret_password_sha256_hex: <namespace/SECRET_NAME/KEY_NAME > shows that USER_NAME ClickHouse user is secured using a Kubernetes Secret. This maps the default user's password to a specific Kubernetes Secret.
- USER_NAME/k8s_secret_password_sha256_hex: Specifies that for the user named USER_NAME, the password should be read from a Kubernetes Secret as a SHA256 hex string.
- <namespace/SECRET_NAME/KEY_NAME >: This is the reference to the secret itself, structured as namespace/SECRET_NAME/KEY_NAME.
- Purpose: This allows for secure, GitOps-friendly password management, preventing plain-text passwords from appearing in Kubernetes manifests.
- Implementation: The ClickHouse Operator reads this secret and places the hashed password into the users.xml file for the ClickHouse server. Operator reads the secret, hashes the password (if necessary), and writes it into a file called /etc/clickhouse-server/users.d/chop-generated-users.xml inside your ClickHouse pod. If you have External Secrets installed, this secret is likely being pulled from AWS Secrets Manager.
- Alternative: You can also use k8s_secret_env_password_sha256_hex to load the password via an environment variable.
In the Altinity Operator, the syntax USER_NAME/k8s_secret_password_sha256_hex is a pointer. It tells the operator to look into a specific secret to find the password hash for the USER_NAME user.
To get the password:
% kubectl get secret <SECRET_NAME> \
-n <NAMESPACE> \
-o jsonpath="{.data.<KEY_NAME>}" \
| base64 -d
NAMESPACE is usually clickhouse.
How to check Clickhouse health?
Since ClickHouse is running in our cluster, the best way to verify it's "working fine" is to move beyond just checking the Pod status and actually query the database engine itself.
Here is a 3-step approach to verify health, connectivity, and data integrity.
1. The "Internal" Health Check
The quickest way is to execute a command directly inside the pod using the clickhouse-client. This bypasses networking issues and tells you if the engine is responsive. Run this command:
kubectl exec -it chi-clickhouse-ch-0-0 \
-n clickhouse \
-- clickhouse-client --query "SELECT version(), uptime()"
chi-clickhouse-ch-0-0 is the name of the pod, it can also be like chi-clickhouse-ch-0-0-0.
If this returns data, it means ClickHouse is successfully reading from its system tables on the EBS volume.
What to look for: It should return the version string and the number of seconds the server has been up. If this fails, the DB engine itself is hung.
If you are using default user which has a password, or, default user was disabled, the output might show the error similar to this:
% kubectl exec -it chi-clickhouse-ch-0-0-0 -n clickhouse -- clickhouse-client --query "SELECT version(), uptime()"
Defaulted container "clickhouse" out of: clickhouse, clickhouse-backup
Code: 516. DB::Exception: Received from localhost:9000. DB::Exception: default: Authentication failed: password is incorrect, or there is no user with such name.
If you have installed ClickHouse and forgot password you can reset it in the configuration file.
The password for default user is typically located at /etc/clickhouse-server/users.d/default-password.xml
and deleting this file will reset the password.
See also /etc/clickhouse-server/users.xml on the server where ClickHouse is installed.
. (AUTHENTICATION_FAILED)
command terminated with exit code 4
Seeing an AUTHENTICATION_FAILED error instead of a Connection Refused error is actually a positive result for this check:
- Networking works: Your kubectl exec reached the pod.
- Process is alive: The ClickHouse server is running and actively rejecting bad logins.
- Storage is mounted: ClickHouse can't check credentials if it can't read its config files from disk.
If we know the Clickhouse credentials, we can perform the health check:
% kubectl exec -it chi-clickhouse-ch-0-0-0 -n clickhouse -- clickhouse-client --user USER --password PASS --query "SELECT version(), uptime(), name FROM system.clusters"
Defaulted container "clickhouse" out of: clickhouse, clickhouse-backup
24.8.14.10544.altinitystable 501389 all-clusters
24.8.14.10544.altinitystable 501389 all-replicated
24.8.14.10544.altinitystable 501389 all-sharded
24.8.14.10544.altinitystable 501389 ch
24.8.14.10544.altinitystable 501389 default
The output above is exactly what we wanted to see. The database is responsive, healthy, and has an uptime of ~5.8 days (501,389 seconds). The version 24.8.14.10544.altinitystable indicates we are on a very recent, stable Altinity build.
2. Check Replication and Disk Health
Since you are using the Altinity Operator, ClickHouse is likely managing data across disks. You want to ensure the "System" tables report no errors. Run this to check if the disks are mounted and have space:
% kubectl exec -it chi-clickhouse-ch-0-0-0 \
-n clickhouse \
-- clickhouse-client --user USER --password PASS \
--query "SELECT name, path, formatReadableSize(free_space) AS free, formatReadableSize(total_space) AS total FROM system.disks"
Defaulted container "clickhouse" out of: clickhouse, clickhouse-backup
default /var/lib/clickhouse/ 89.60 GiB 95.80 GiB
If you have multiple replicas (e.g., a ch-0-1 pod), check for replication lag:
% kubectl exec -it chi-clickhouse-ch-0-0-0 \
-n clickhouse \
-- clickhouse-client --user USER --password PASS \
--query "SELECT type, last_exception, num_tries FROM system.replication_queue WHERE last_exception != ''"
Defaulted container "clickhouse" out of: clickhouse, clickhouse-backup
Result: This should ideally be empty (or as above) . If you see exceptions here, your nodes aren't syncing correctly.
3. Verify the "Operator" View
The Altinity Operator provides a "Status" field in its Custom Resource that summarizes the health of the entire installation.
% kubectl get chi -n clickhouse
NAME CLUSTERS HOSTS STATUS HOSTS-COMPLETED AGE
clickhouse 1 1 Completed 123d
What to look for: Look for the STATUS column. It should say Completed. If it says InProgress or Error, the Operator is struggling to configure the cluster.
4. Check the Backup (Safety Net)
Since you saw clickhouse-backup pods earlier, verify that the last backup actually succeeded. This is your "point of no return" check before the upgrade.
kubectl logs -n clickhouse -l job-name=clickhouse-backup-cron-<TIMESTAMP>
(Replace <TIMESTAMP> with one of the strings from your previous get all output, e.g., 29543400).
Look for: Done, Success, or Upload finished.
5. Check the Status of All Replicas
To be absolutely sure the cluster is "Green" before you start the EKS upgrade, run this to check the status of all replicas in the cluster:
% kubectl exec -it chi-clickhouse-ch-0-0-0 -n clickhouse -- clickhouse-client --user USER --password PASS --query "SELECT replica_path, is_leader, is_readonly, future_parts FROM system.replicas"
is_readonly: Should be 0. If it's 1, the node can't write data (usually a Zookeeper issue).
is_leader: One of your replicas should be 1.
Summary Checklist
Test Command Goal Good Result
Ping SELECT 1 1
Uptime SELECT uptime() >0
Storage system.disks Free space > 10%
Operator kubectl get chi Completed

No comments:
Post a Comment