Tuesday, 23 December 2025

Kubernetes EndpointSlices

 

EndpointSlices:
  • a modern, scalable way to track which Pods back a Service and how to reach them
  • split Service endpoints into small, scalable chunks that Kubernetes networking components can efficiently consume
  • replace (and improve on) the older Endpoints object

The problem they solve


Originally, Kubernetes used a single Endpoints object per Service that listed all Pod IPs and ports.

This caused issues at scale:
  • Large Services (hundreds/thousands of Pods) created huge objects
  • Frequent updates caused API server and etcd pressure
  • Harder to extend with metadata (zone, topology, readiness, etc.)

What EndpointSlices are


An EndpointSlice is:

  • A Kubernetes API object (discovery.k8s.io/v1)
  • Owned by a Service
  • Contains a subset of endpoints (Pods or external IPs)
  • Typically holds up to ~100 endpoints per slice (configurable)

So instead of:

Service → 1 big Endpoints object

We have:

Service → multiple EndpointSlice objects

What’s inside an EndpointSlice


An EndpointSlice includes:

  • Endpoints
    • IP addresses (IPv4 / IPv6)
    • Ready / serving / terminating status
    • Zone & node info
  • Ports
    • Name, port number, protocol
  • AddressType
    • IPv4, IPv6, or FQDN
  • Labels
    • Including kubernetes.io/service-name

Example (simplified):

apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
  name: my-service-abcde
  labels:
    kubernetes.io/service-name: my-service
addressType: IPv4
ports:
- name: http
  port: 80
  protocol: TCP
endpoints:
- addresses:
  - 10.244.1.12
  conditions:
    ready: true
    serving: true
    terminating: false


How they’re used


  • kube-proxy reads EndpointSlices to program iptables/IPVS rules
  • CoreDNS uses them for Service DNS resolution
  • Controllers watch them instead of Endpoints
  • The old Endpoints object still exists for backward compatibility

Ingress → Service → EndpointSlice → Pod IPs

Since modern clusters use EndpointSlices by default, lacking access to them breaks:
  • kubectl describe ingress
  • kubectl describe service
  • Some kubectl get ... -o wide outputs

Why EndpointSlices are better


✅ Scales much better
✅ Smaller, more frequent updates
✅ Topology-aware routing (zones, nodes)
✅ Supports dual-stack (IPv4 + IPv6)
✅ Extensible for future networking features

Relationship to Endpoints (important)


  • Modern clusters use EndpointSlices by default
  • Kubernetes still creates Endpoints objects unless disabled
  • You should:
    • Read EndpointSlices
    • Avoid writing Endpoints directly in new tooling

When you’ll notice them (DevOps angle)


As a DevOps engineer, you’ll run into EndpointSlices when:

  • Debugging Services with many Pods
  • Investigating kube-proxy or networking issues
  • Watching API load in large clusters
  • Writing controllers or operators
  • Tuning Service performance at scale

Useful command:

kubectl get endpointslices
kubectl describe endpointslice <name>


---

Kubernetes Objects

  


Kubernetes objects can be divided into two groups:

  • API objects = persistent, declarative resources stored in etcd
  • Non-API objects = runtime, node-local, or in-memory artifacts created to implement API objects

Kubernetes is fundamentally an API-driven control plane + runtime realization.

What are Kubernetes API objects?


Kubernetes API objects are persistent entities stored in etcd that represent the desired state and current state of your cluster.

They are:
  • Defined by a schema
  • Exposed via the Kubernetes API server
  • Created, read, updated, deleted via kubectl, controllers, or the API
  • Continuously reconciled by controllers
In short, if it’s stored in etcd and managed via the API server, it’s an API object.

Common Kubernetes API objects (by category)


Workloads

  • Pod
  • Deployment
  • ReplicaSet
  • StatefulSet
  • DaemonSet
  • Job
  • CronJob

Networking

  • Service
  • Ingress
  • NetworkPolicy
  • EndpointSlice
  • Endpoints (legacy)

Configuration & storage

  • ConfigMap
  • Secret
  • PersistentVolume (PV)
  • PersistentVolumeClaim (PVC)
  • StorageClass
  • VolumeSnapshot

Cluster & node

  • Node
  • Namespace
  • ResourceQuota
  • LimitRange
  • RuntimeClass

Security & access

  • ServiceAccount
  • Role, ClusterRole
  • RoleBinding, ClusterRoleBinding
  • PodSecurityPolicy (deprecated)
  • PodSecurityAdmission (via config)

Custom & extension

  • CustomResourceDefinition (CRD)
  • Any Custom Resource (e.g. Certificate, Prometheus, IngressRoute)

Key properties of API objects


Every API object has:

apiVersion: ...
kind: ...
metadata:
  name: ...
  namespace: ...
spec:        # desired state
status:      # current state (set by controllers)


And importantly:

Declarative: you describe what you want
Reconciled: controllers try to make reality match spec
Durable: survive restarts

What are “non-API” Kubernetes objects?


This is where the term object becomes fuzzy.

There are many things in Kubernetes that are real, important, and named — but are NOT API objects because they:
  • Are not stored in etcd
  • Do not have YAML
  • Are not CRUD’d via the API
  • Are usually runtime, internal, or node-local

Major categories of non-API Kubernetes “objects”


1️⃣ Containers

  • Individual containers inside a Pod
  • Managed by the container runtime (containerd / CRI-O)
  • Kubernetes does NOT store container objects
Example:
  • You can’t do kubectl get container
  • Containers exist only while the Pod is running

2️⃣ Volumes (runtime volumes)


While PV/PVC are API objects, the actual mounted volumes are not.

Examples:
  • emptyDir
  • hostPath
  • Mounted CSI volumes
These are:
  • Created on the node
  • Managed by kubelet + CSI
  • Invisible to the API as concrete objects

3️⃣ kube-proxy rules (iptables / IPVS)


  • iptables chains
  • IPVS virtual servers
  • eBPF maps (Cilium)
These implement Services but are:
  • Node-local
  • Ephemeral
  • Derived from API objects

4️⃣ Scheduler & controller internals


Examples:
  • Scheduling queues
  • Controller work queues
  • Leader election locks (in-memory)
  • Cached informers
These exist:
  • In process memory
  • For performance and correctness
  • Never stored in etcd

5️⃣ CNI networking artifacts


Examples:
  • Network namespaces
  • veth pairs
  • Routes
  • eBPF programs
  • Overlay tunnels

Created by:
  • kubelet
  • CNI plugins (Calico, Cilium, Flannel)

Not visible as API objects.

6️⃣ DNS records


  • Service A/AAAA records
  • Pod DNS entries
They’re derived from:
  • Services
  • EndpointSlices
But the records themselves:
  • Live inside CoreDNS
  • Are regenerated dynamically

7️⃣ Events (kind of special)


Event is an API object, but:
  • Short-lived
  • Often garbage-collected quickly
  • Not something you “own” declaratively
This blurs the line between API and runtime artifacts.

8️⃣ Admission & runtime state


Examples:
  • Mutated Pod specs (post-admission)
  • Pod sandbox state
  • cgroups
  • seccomp / AppArmor profiles applied

These exist at runtime, not as first-class objects.


A useful mental model


Three layers of “things” in Kubernetes
┌─────────────────────────────┐
│  API Objects (etcd)         │  ← declarative, durable
│  Pods, Services, Secrets    │
├─────────────────────────────┤
│  Controllers & Reconcilers  │  ← logic
├─────────────────────────────┤
│  Runtime Artifacts          │  ← ephemeral, node-local
│  Containers, iptables, CNI  │
└─────────────────────────────┘

Rule of thumb


Ask these questions:

Can I kubectl get it?
→ Yes → API object
→ No → probably runtime

Is it stored in etcd?
→ Yes → API object

Does it survive a full cluster restart?
→ Yes → API object
→ No → runtime artifact

Why this distinction matters (DevOps relevance)

  • Explains why “kubectl says it exists but it’s not working”
  • Helps debug node-local vs cluster-wide issues
  • Critical when writing controllers, operators, or CRDs
  • Clarifies what’s source of truth vs derived state

---

Thursday, 18 December 2025

Security Hardening of AWS EC2 Instances



Are password authentication and root login on Amazon EC2 instance disabled by default?


From Manage system users on your Amazon EC2 Linux instance - Amazon Elastic Compute Cloud:

By default, password authentication and root login are disabled, and sudo is enabled. To log in to your instance, you must use a key pair. For more information about logging in, see Connect to your Linux instance using SSH.

You can allow password authentication and root login for your instance. For more information, see the documentation for your operating system.


On a standard Amazon EC2 Linux instance, both are disabled by default, and SSH key-based login with a non-root user is required.​

Default SSH access

By default, you connect to an EC2 Linux instance using a non-root account such as ec2-user (Amazon Linux) or ubuntu (Ubuntu) with an SSH key pair, not a password. This design enforces public key authentication and avoids exposing password-based logins on the internet.​

Password authentication

Password authentication over SSH is disabled by default on EC2 Linux instances, so you cannot log in with a username and password until you explicitly enable it in sshd_config. To log in initially, you must use the key pair specified when the instance was launched.​

Root login

Direct root SSH login is also disabled by default; you are expected to log in as the default user and then use sudo to gain root privileges. Root login can be enabled later by changing PermitRootLogin in sshd_config, but this is discouraged from a security standpoint.


To verify the current settings, checkout out the /etc/ssh/sshd_config and look for these settings (their value can be yes or no):
  • PermitRootLogin
  • PasswordAuthentication

---

Monday, 24 November 2025

How to run Node, npm, Prettier, Yarn and Serverless via Docker

 

We sometimes don't want to pollute our local machine by installing Node if we don't use it often. In this scenario we can run a desired version of Node via Docker container:

docker run --rm \
  node:16-alpine \
  sh -c "node --version"

Output:

v16.20.2


Running npm

The above also means that we can use Node tools against our local Node application repository, without the need to install Node locally:

docker run --rm \
  -v "$PWD":/app \
  -w /app \
  node:16-alpine \
  sh -c "npm install && npm audit"


We can run npm audit fix to fix critical issues and npm audit fix --force to address all issues (including breaking changes).


The above command should be run from the project's root directory.

If package.json lists some dependencies from a private package hosted on GitHub Packages e.g.:

  "dependencies": {
    ...
    "@foo/bar": "^0.5.4",
    ...
  } 

...and inside Docker there is no GitHub token, npm install might throw this error if npm can't be authenticated against GitHub:

npm ERR! 401 Unauthorized - GET https://npm.pkg.github.com/download/@foo/bar/0.5.4/db46279e9b10a74cec83b15ac06422c479e4d193fd3c8366c839ace085244c9b - authentication token not provided

This token is GitHub Personal Access Token (PAT) and it should have a permission to read our private npm package.

If our local machine is authenticating via GitHub CLI (gh) we can run:

gh auth token

Output is PAT that npm can use and it will be in this format:

ghp_xxx...

We can store this value in the local env variable:

export NODE_AUTH_TOKEN=$(gh auth token)

We now need to create a local .npmrc file which contains authentication 

echo "@foo:registry=https://npm.pkg.github.com/" > .npmrc
echo "//npm.pkg.github.com/:_authToken=${NODE_AUTH_TOKEN}" >> .npmrc

...so .npmrc file will look like this:

@foo:registry=https://npm.pkg.github.com/
//npm.pkg.github.com/:_authToken=ghp_aAT3B3N...iiOO

If we try to execute npm install again, the issue with missing token should be resolved now.


Running Prettier

If prettier is added as devDependency in NodeJS project and .prettierrc config file is provided we can run prettier from Node Docker container, with no need to install Node on the local machine:

To find style errors:

docker run --rm \     
  -v "$PWD":/app \
  -w /app \
  node:18-alpine \
  sh -c "npm install && npx prettier --ignore-path .gitignore --check '**/*.+(ts|js|json|yml)'"

To fix the reported style errors:

docker run --rm \     
  -v "$PWD":/app \
  -w /app \
  node:18-alpine \
  sh -c "npm install && npx prettier --ignore-path .gitignore --write '**/*.+(ts|js|json|yml)'"


Running yarn

Node Docker image comes with yarn installed so we can use it just as we used npm:

docker run --rm \
  -v "$PWD":/app \
  -w /app \
  node:18-alpine \
  sh -c "yarn install --frozen-lockfile && yarn audit --audit-level=critical"


To remove some dependency:

docker run --rm \
  -v "$PWD":/app \
  -w /app \
  node:20-alpine \
  sh -c "yarn remove serverless-esbuild" 

Running Serverless


If we have Node-based Serverless project, we can run Serverless deployment from Node Docker container: 

docker run --rm \
  -v "$PWD":/app \
  -w /app \
  -e SERVERLESS_ACCESS_KEY=xxxx \
  node:18-alpine \
  sh -c "npm install -g serverless && yarn install --frozen-lockfile && yarn sls deploy --stage development"


Or like here:

docker run --rm \                              
  -v "$PWD":/app \
  -w /app -e SERVERLESS_ACCESS_KEY=xxxx \
  node:22-alpine \
  sh -c "npm install -g serverless && npm ci && sls deploy --stage development" 


Running tsc


docker run --rm \
  -v "$PWD":/app \
  -w /app -e SERVERLESS_ACCESS_KEY=xxx \
  node:22-alpine \
  sh -c "npm install -g serverless && npm ci && npx tsc -p ./tsconfig.json --noEmit --skipLibCheck"

---

How to run TypeScript compiler (tsc) via Docker

 

TypeScript compiler, often referred to as tsc , is responsible for compiling TypeScript code into JavaScript. It takes TypeScript source files as input and generates equivalent JavaScript files that can run in any JavaScript environment, ensuring compatibility with browsers.

If for any reason we don't want to install TypeScript compiler on our machine but want to use it to check the TypeScript syntax in the project, we can run in from a Docker container.

Let's first formulate a command which only checks the syntax:

tsc \
   --project ./tsconfig.json \
   --noEmit \
   --skipLibCheck

--project ./tsconfig.json - specifies the project configuration file (we can also use the short version of --project which is -p). tsc will compile only files included by that config.

--noEmit - TypeScript performs type-checking only, but does not output any .js, .d.ts, or build artifacts. Useful for: CI validation, pre-commit checks, linting purely for types, speeding up checks when we don’t care about compiled output.

--skipLibCheck - Tells TS not to check types inside node_modules or .d.ts libraries. This makes type checking much faster and avoids irrelevant type errors from dependencies. It skips: DefinitelyTyped typings, node_modules/*/*.d.ts, any imported library declaration file.


Before running TypeScript compiler, we need to install project's dependencies by running:

npm install

This is because:
  • This installs tsc (if package.json lists typescript among devDependencies, which it should in our case)
  • tsc must load types from our dependencies (@types/..., or any .d.ts shipped by packages).
  • Without node_modules, TypeScript cannot resolve imports like:
                import express from "express";

          and type-checking will fail.


If we have Node installed locally, we can run npm install before running Docker but otherwise, we can just invoke npm from the Docker image (npm install will install tsc):

docker run --rm \
  -v "$PWD":/app \
  -w /app \
  node:20-alpine \
  sh -c "npm install && npx tsc -p ./tsconfig.json --noEmit --skipLibCheck --listFiles --diagnostics"

-v "$PWD":/app - mounts our current directory to the container
-w /app - sets the working directory
node:20-alpine - lightweight Node image
npm install && npx tsc ... - installs project dependencies and runs TypeScript from our local node_modules


I also added --listFiles and --diagnostics flags, so tsc outputs something even if checks of all files pass as otherwise it will not emit any message.


If we are using Yarn instead of npm, we can call tsc directly from yarn:

docker run --rm \
  -v "$PWD":/app \
  -w /app \
  node:20-alpine \
  sh -c "yarn && yarn tsc -p ./tsconfig.json --noEmit --skipLibCheck"


---

Friday, 31 October 2025

Elasticsearch Nodes


Elasticsearch nodes are individual instances of Elasticsearch servers that are part of a cluster. Each node stores data and participates in the cluster’s indexing and search capabilities, playing a critical role in the distributed architecture of Elasticsearch.​

Key Points about Elasticsearch Nodes:


A node is a single server or instance running Elasticsearch, identified by a unique name.

Nodes collectively form a cluster, which is a group of Elasticsearch nodes working together.

Nodes can have different roles:
  • Master Node: Manages the cluster state and handles cluster-wide actions like adding/removing nodes and creating/deleting indices.
  • Data Node: Stores data and executes data-related operations such as searches and aggregations.
  • Client (Coordinating) Node: Routes requests to the appropriate nodes but does not hold data.
  • Other special roles include ingestion and machine learning nodes.

Nodes communicate through TCP ports (commonly 9200 for REST API and 9300 for node-to-node communication).

Elasticsearch distributes data across nodes using shards, enabling horizontal scalability, fault tolerance, and high availability.​

In essence, nodes are the building blocks of an Elasticsearch cluster, with each node running on a server (physical or virtual) and working in coordination to provide fast search and analytics on distributed data.

To list all nodes with their attributes we can run this command in Kibana DevTools:


GET /_cat/nodes?v

Output example:

ip            heap.percent ram.percent cpu load_1m load_5m load_15m    node.role       master name
10.199.43.136           44          61   5    1.69    1.71     1.51 cdfhilmrstw -      default-2
10.199.6.164            38          55   4    0.96    1.40     1.33 cdfhilmrstw -      default-1
10.199.30.70            25          51   9    1.61    1.57     1.06 cdfhilmrstw -      data-0
10.199.38.215           46         100  13    1.69    1.71     1.51 cdfhilmrstw -      data-1
10.199.1.249            81          76  30    0.96    1.40     1.33 cdfhilmrstw *      monitoring-1
10.199.32.134           75         100  27    1.69    1.71     1.51 cdfhilmrstw -      monitoring-0
10.199.23.94            77         100  26    1.61    1.57     1.06 cdfhilmrstw -      monitoring-2
10.199.18.75            23          91  19    1.61    1.57     1.06 cdfhilmrstw -      default-0
10.199.15.193           59          56   5    0.96    1.40     1.33 cdfhilmrstw -      data-2


Node Attributes

node_id 
  • e.g. aZB9fwOuRWCpINHh6IrOJg",
node_name
  • monitoring-2
transport_address
  • e.g. 10.2.31.148:9300
node_attributes
  • ml.allocated_processors_double
    • e.g. 1.0
  • ml.config_version
    • e.g. 12.0.0
  • transform.config_version
    • e.g. 10.0.0
  • xpack.installed
    • true
  • ml.allocated_processors
    • e.g. 1
  • k8s_node_name
    • ip-10-2-30-209.us-east-2.compute.internal
  • ml.machine_memory
    • 6442450944
  • ml.max_jvm_size
    • e.g. 2147483648
  • type
    • TODO: how it's set?
    • e.g. monitoring
    • used by filter decider which decides whether index shard should be allocated on the node by checking if node has filter required by index. If it doesn't we'll see a messages like this
"deciders": [
  ...
  {
    "decider": "filter",
    "decision": "NO",
    "explanation": """node does not match index setting [index.routing.allocation.require] filters [type:"monitoring"]"""
  },
  ...
]
 

roles
  • data
  • data_cold
  • data_content
  • data_frozen
  • data_hot
  • data_warm
  • ingest
  • master
  • ml
  • remote_cluster_client
  • transform



---

Elasticsearch Indices




An Elasticsearch index is a logical namespace that stores and organizes a collection of related JSON documents, similar to a database table in relational databases but designed for full-text search and analytics. 

Each index is uniquely named and can contain any number of documents, where each document is a set of key-value pairs (fields) representing your data.​

Key Features of an Elasticsearch Index


  • Structure: An index is comprised of one or more shards, which are distributed across nodes in the Elasticsearch cluster for scalability and resilience.​
  • Mapping and Search: Indexes define mappings that control how document fields are stored and searched.
  • Indexing Process: Data is ingested and stored as JSON documents in the index, and Elasticsearch builds an inverted index to allow for fast searches.​
  • Use Case: Indices are used to organize datasets in log analysis, search applications, analytics, or any scenario where rapid search/retrieval is needed.​

In summary, an Elasticsearch index is the foundational storage and retrieval structure enabling efficient search and analytics on large datasets.

When analysing an arbitrary index, we want to know:
  • its size
  • shards
    • their number
    • allocation - on which nodes they are allocated (and allocation criteria: which node types should these shards be allocated to)
  • does it have any data retention defined (Index Lifecycle Policy)
  • historical rate/growth of storage usage for data


Index Lifecycle Policy (ILM)


An Index Lifecycle Management (ILM) policy defines what happens to an index as it ages — automatically. It’s a set of rules for retention, rollover, shrink, freeze, and delete.

Example:

PUT _ilm/policy/functionbeat
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": { "max_age": "30d", "max_size": "50GB" }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": { "delete": {} }
      }
    }
  }
}


This says:
  • Keep the index hot (actively written to) until it’s 30 days old or 50 GB big.
  • Then roll over (create a new index and switch writes to it).
  • After 90 days, delete the old index.

ILM can be applied to a standard (non–data stream) index. We can attach an ILM policy to any index, not just data streams. However, there’s a big difference:

  • Rollover alias required:
    • Standard Index:Yes. We must manually set up an alias to make rollover work!
    • Data Stream: No (handled automatically - Elastic manages the alias and the backing indices)
  • Multiple backing indices
    • Standard Index: Optional (via rollover)
    • Data Stream: Always (that’s how data streams work)
  • Simplified management
    • Standard Index: Manual setup
    • Data Stream: Built-in

Index Rollover vs Data Stream


If we have a continuous stream of documents (e.g. logs) being written to Elasticsearch, we should not write them to a regular index as its size will grow over time and we'll need to keep increasing a node storage. Instead, we should consider one of the following options:

  1. Data Stream
  2. Index with ILM policy which defines a rollover conditions

What does rollover mean for a standard index?

When a rollover is triggered (by size, age, or doc count):

  • Elasticsearch creates a new index with the same alias.
  • The alias used for writes (e.g. functionbeat-write) is moved from the old index to the new one.
  • Functionbeat or Logstash continues writing to the same alias, unaware that rollover happened.


Example:

# Initially
functionbeat-000001  (write alias: functionbeat-write)

# After rollover
functionbeat-000001  (read-only)
functionbeat-000002  (write alias: functionbeat-write)


This keeps the write flow continuous and allows you to:
  • Manage old data (delete, freeze, move to cold tier)
  • Limit index size for performance

How to apply ILM to a standard index?

Here’s a minimal configuration:

PUT _ilm/policy/functionbeat
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": { "max_age": "30d", "max_size": "50GB" }
        }
      },
      "delete": {
        "min_age": "30d",
        "actions": { "delete": {} }
      }
    }
  }
}

PUT _template/functionbeat
{
  "index_patterns": ["functionbeat-*"],
  "settings": {
    "index.lifecycle.name": "functionbeat",
    "index.lifecycle.rollover_alias": "functionbeat-write"
  }
}


The following command creates a new index called functionbeat-000001 (if it doesn’t already exist). If the index does exist, it updates the aliases section. It creates an alias named functionbeat-write that points to this index. (Aliases are like virtual index names — you can send reads or writes to the alias instead of a specific index. They’re lightweight and flexible.). is_write_index: true tells Elasticsearch: “When someone writes to this alias, route the write operations to this index.” If you later have: functionbeat-000001, functionbeat-000002 and both share the alias functionbeat-write, then only the one with "is_write_index": true will receive new documents.

PUT functionbeat-000001
{
  "aliases": {
    "functionbeat-write": { "is_write_index": true }
  }
}


ILM rollover works by:
  • Watching the alias (functionbeat-write), not a specific index.
  • When rollover conditions are met (e.g. 50 GB or 30 days), Elasticsearch:
    • Creates a new index (functionbeat-000002)
    • Moves "is_write_index": true from 000001 to 000002. From that moment, all new Functionbeat writes go to the new index — automatically.
After rollover:
  • functionbeat-000001 becomes read-only, but still searchable.
  • ILM will later delete it when it ages out (based on your policy).

So that last command effectively bootstraps the first generation of an ILM-managed index family.
  • ILM policy: Automates rollover, delete, etc.
  • Rollover action: Creates a new index and shifts the alias
  • Alias requirement: Required, used for write continuity
  • Data stream alternative: Better option, handles rollover and aliasing for you

Index Template

Index templates do not retroactively apply to existing indices. They only apply automatically to new indices created after the template exists.

When we define an index template like:

PUT _index_template/functionbeat
{
  "index_patterns": ["functionbeat-*"],
  "template": {
    "settings": {
      "index.lifecycle.name": "functionbeat"
    }
  }
}


That template becomes part of the index creation logic.

So:

When a new index is created (manually or via rollover),
→ Elasticsearch checks all templates matching the name.
→ The matching template(s) are merged into the new index settings.

Existing indices are not touched or updated.

If we already have an index — e.g. functionbeat-8.7.1 — that matches the template pattern, it won’t automatically get the template settings.

We need to apply those manually, for example:

PUT functionbeat-8.7.1/_settings
{
  "index.lifecycle.name": "functionbeat",
  "index.lifecycle.rollover_alias": "functionbeat-write"
}

Now the existing index is under ILM control (using the same settings the template would have applied if it were created fresh).

Elasticsearch treats index templates as blueprints for new indices, not as live configurations.
This is intentional — applying settings automatically to existing indices could cause:
  • unintended allocation moves,
  • mapping conflicts,
  • or lifecycle phase resets.

We want to keep as least as possible data in Elasticsearch. If data stored are logs, we want to:
  • make sure apps are sending only meaningful logs
  • make sure we capture repetitive error messages so the app can be fixed and stop emitting them

Component templates are building blocks for constructing index templates that specify index mappings, settings, and aliases.

If index template consists of several component templates and some of these templates impose ILM policies, ILM policy of the last component template wins and is chosen to be the one applied to index or data stream based on index template. 

Data retention is applied at the document level while ILM policy is applied at the index level, ILM policies take precedence. Data retention is defined in Component Templates.

Shards and Replicas


We can set the number of shards and replicas per index in Elasticsearch when we create the index, and we can dynamically update the number of replicas (but not the number of primary shards) for existing indices.​

Setting Shards and Replicas on Index Creation


Specify the desired number in the index settings payload:


PUT /indexName
{
  "settings": {
    "index": {
      "number_of_shards": 6,
      "number_of_replicas": 2
    }
  }
}

This creates the index with 6 primary shards and 2 replicas per primary shard.​

Adjusting Replicas After Creation


You can adjust the number of replicas for an existing index using the settings API:


PUT /indexName/_settings
{
  "index": {
    "number_of_replicas": 3
  }
}

Replicas can be changed at any time, but the number of primary shards is fixed for the lifetime of the index.​

Shard and Replica Principles


Each index has a configurable number of primary shards.
Each primary shard can have multiple replica shards (copies).
Replicas improve fault tolerance and can spread search load.​

We should choose shard and replica counts based on data size, node count, and performance needs. Adjusting these settings impacts resource usage and indexing/search performance.


Index Size


To find out the size of each index (shards) we can use the following Kibana DevTools query:


GET /_cat/shards?v&h=index,shard,prirep,state,unassigned.reason,node,store&s=store:desc

The output contains the following columns:
  • index - index name
  • shard - order number of a (primary) shard. If we have 2 shards and 2 replicas, we'd have 4 rows, with shard=0 for first two rows (first primary and replica) and shard=1 for next two rows (second primary and replica)
  • prirep - is shard a primary (p) or replica (r)
  • state - e.g. STARTED
  • unassigned
  • reason
  • node - name of the node
  • store - used storage (in gb, mb or kb)


Each shard should not be larger than 50GB. We can impose this via Index Lifecycle Policy where we can set rollover criteria.


Document Routing 


Routing in an Elasticsearch cluster determines which shard a document is sent to and stored in. The default process uses a hash of the document's _id to find the shard, ensuring an even distribution. However, you can implement custom routing to a specific field value to ensure related documents are on the same shard, which improves search performance by reducing the scope of queries. 

How routing works


Default routing: When a document is indexed, Elasticsearch calculates the target shard by hashing the document's ID and using the formula shard = hash(_routing) % number_of_primary_shards. The default _routing value is the _id of the document.

Custom routing: You can specify a different routing value, like a user ID or a country code, by providing it during indexing. This directs all documents with the same routing value to the same shard, which can significantly speed up queries that filter by that value.

Querying with custom routing: When you perform a query, you can provide the same routing value. 
Elasticsearch will then only search the specific shard containing documents with that value, rather than searching all shards in the cluster. 

Benefits of custom routing


Improved search speed: By narrowing the search to a specific shard, you reduce the amount of data that needs to be searched, leading to faster results.

Efficient resource use: Routing minimizes the computational overhead on the cluster because nodes don't have to process queries that are irrelevant to their data.

Scalability for multitenant applications: Routing is crucial for horizontal scaling in applications with multiple tenants, as it can isolate each tenant's data to specific shards. 

Considerations


Data distribution: If you use a custom routing value, ensure the data is relatively evenly distributed across all shards. If one shard accumulates a disproportionate amount of data, it can create performance bottlenecks.

Security: For multitenant applications, the application layer must handle security and access checks to prevent users from querying data from another user's shard, as Elasticsearch does not enforce this isolation automatically. 


Shard Allocation 


Just as there are rules which determine in which shard should a document be written, there are rules which determine onto which node a shard should be pushed to (allocated).

 
Routing allocation watermarks in an Elasticsearch cluster are thresholds that control shard allocation based on a node's disk usage to prevent the cluster from running out of space. The three main watermarks are low (stops allocating new shards), high (relocates existing shards), and flood stage (makes indices read-only). These settings help maintain stability by proactively managing disk space, but should be adjusted based on cluster needs and capacity.
 

Watermark thresholds


Low watermark: The default is 85%. When a node's disk usage exceeds this limit, Elasticsearch stops allocating new shards to that node.

High watermark: The default is 90%. If a node's disk usage goes above this threshold, Elasticsearch starts relocating existing shards away from that node to other nodes with more available space.

Flood stage watermark: The default is 95%. When this threshold is reached, Elasticsearch makes all indices on that node read-only to prevent further data from being written, though reads are still possible. 




Configuration and use cases


Monitoring and prevention: These settings are crucial for preventing nodes from running out of storage, which can cause shard failures and instability.

Proactive scaling: You can set the thresholds based on your infrastructure's growth. For instance, if you anticipate nodes filling up quickly, you might set lower thresholds to proactively distribute the load.

Dynamic systems: Using percentage values is best for dynamic systems where disk sizes can vary. You can also use absolute byte values for fixed-size storage environments.

Cluster settings: These settings are cluster-wide and are managed through the Elasticsearch configuration file (elasticsearch.yml) or by using the cluster update settings API.

Node role-specific settings: For advanced setups, you can configure different flood stage watermarks for nodes with different roles, such as hot, warm, and cold nodes, allowing for more tailored allocation strategies. 

How to manage them


Review current settings: You can check the current watermark settings using the 
cluster.routing.allocation.disk.watermark.low, high, and flood_stage settings in the cluster settings API.

Change settings: To modify the watermarks, update the cluster settings via the API or by editing the elasticsearch.yml configuration file on the master nodes and restarting.

Recommendations: It's recommended to have enough buffer space (e.g., 3x the size of your largest shard) to handle shard relocation and potential growth.


To check the current allocation watermarks (for each node) in cluster:

GET _cluster/settings?include_defaults=true

Output snippet:

"routing": {
  "use_adaptive_replica_selection": "true",
  "rebalance": {
    "enable": "all"
  },
  "allocation": {
    "enforce_default_tier_preference": "true",
    "node_concurrent_incoming_recoveries": "2",
    "node_initial_primaries_recoveries": "4",
    "desired_balance": {
      "max_balance_computation_time_during_index_creation": "1s",
      "progress_log_interval": "1m",
      "undesired_allocations": {
        "log_interval": "1h",
        "threshold": "0.1"
      }
    },
    "same_shard": {
      "host": "false"
    },
    "total_shards_per_node": "-1",
    "type": "desired_balance",
    "disk": {
      "threshold_enabled": "true",
      "reroute_interval": "60s",
      "watermark": {
        "flood_stage.frozen.max_headroom": "20GB",
        "flood_stage": "95%",
        "high": "90%",
        "low": "85%",
        "flood_stage.frozen": "95%",
        "flood_stage.max_headroom": "100GB",
        "low.max_headroom": "200GB",
        "high.max_headroom": "150GB"
      }
    },
    "awareness": {
      "attributes": [
        "k8s_node_name"
      ]
    },
    "balance": {
      "disk_usage": "2.0E-11",
      "index": "0.55",
      "threshold": "1.0",
      "shard": "0.45",
      "write_load": "10.0"
    },
    "enable": "all",
    "node_concurrent_outgoing_recoveries": "2",
    "allow_rebalance": "always",
    "cluster_concurrent_rebalance": "2",
    "node_concurrent_recoveries": "2"
  }
},

We can see routing.allocation.disk.watermark settings.


If allocation of some shard onto node of target type fails, we can check the reason:

GET /_cluster/allocation/explain

...might have the output which unveils that the root cause for shard being unassigned is no enough storage on that node:

  "node_allocation_decisions": [
    {
      "node_id": "8r4E9pZL........wwAw",
      "node_name": "data-0",
      "transport_address": "10.22.31.122:9300",
      "node_attributes": {
        "k8s_node_name": "ip-10-22-18-240.us-east-1.compute.internal",
        "ml.machine_memory": "8589934592",
        "ml.max_jvm_size": "3221225472",
        "type": "data",
        "ml.allocated_processors_double": "1.0",
        "ml.config_version": "12.0.0",
        "transform.config_version": "10.0.0",
        "xpack.installed": "true",
        "ml.allocated_processors": "1"
      },
      "roles": [
        "data",
        "data_cold",
        "data_content",
        "data_frozen",
        "data_hot",
        "data_warm",
        "ingest",
        "master",
        "ml",
        "remote_cluster_client",
        "transform"
      ],
      "node_decision": "no",
      "deciders": [
        {
          "decider": "disk_threshold",
          "decision": "NO",
          "explanation": "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=85%], having less than the minimum required [95.8gb] free space, actual free: [67.5gb], actual used: [89.4%]"
        }
      ]
    },

---