Pages

Thursday, 12 June 2025

Useful Kibana DevTools Queries






Elasticsearch’s query DSL is structured by query types.

Every top-level query must be one of the defined types:
  • match
  • term
  • range
  • bool
  • wildcard
  • query_string
  • function_score
  • etc.

Main APIs:

  • _cat - for a human-readable summary
  • _stats - for a detailed JSON response

Failed Queries


Example:

{
  "statusCode": 502,
  "error": "Bad Gateway",
  "message": "Client request timeout for: https://my.elastic-system.svc:9200 with request GET /my_index/_search?pretty=true"
}


A 502 Bad Gateway combined with a timeout usually means the Elasticsearch engine is struggling to process the request, or there is a networking bottleneck between Kibana and the database.



Cluster

To check cluster health:

GET /_cluster/health

GET /_cluster/health?level=shards

The output contains status which can be green, yellow or red.

To check status of each shard:

GET _cat/shards?v&h=index,shard,prirep,state,unassigned.reason,node

The output shows if shard is primary (p) or replica (r). It also shows the status which can be e.g. STARTED, UNASSIGNED  and reason which can be e.g. ALLOCATION_FAILED.

To sort the output by some column we can use s parameter:

GET /_cat/shards?v&h=index,shard,prirep,state,unassigned.reason,node,store&s=state
GET /_cat/shards?v&h=index,shard,prirep,state,unassigned.reason,node,store&s=node
GET /_cat/shards?v&h=index,shard,prirep,state,unassigned.reason,node,store&s=index 

To sort in descending order, append :desc to the name of the sorted column:

GET /_cat/shards?v&h=index,shard,prirep,state,unassigned.reason,node,store&s=store:desc



To get memory allocation and consumption per node:

GET /_cat/allocation?v&s=node

The output contains the following columns:
  • shards (number)
  • shards.undesired
  • write_load.forecast
  • disk.indices.forecast (in Gb or Tb)
  • disk.indices (in Gb or Tb)
  • disk.used (in Gb or Tb)
  • disk.avail (in Gb or Tb)
  • disk.total (in Gb or Tb)
  • disk.percent (number, %)
  • host (IP address)
  • ip (IP address)
  • node (node name or UNASSIGNED)
  • node.role (combination of cdfhilmrstw)

If some shard is not allocated, we can check the reason:

GET /_cluster/allocation/explain

To manually trigger retry of all previously failed shard allocations:

POST /_cluster/reroute?retry_failed=true

To check the progress, check the health of the cluster and:

GET /_cat/recovery/my_index?v



Index



In Elasticsearch, every index has a Mapping. Think of it like a database table definition. If a field isn't explicitly defined or hasn't been automatically detected from an uploaded document, Elasticsearch acts like that field doesn't exist. You can't sort by a column that the database doesn't know about!

To find out which fields exist in index:

GET /my_index/_mapping


To perform a search operation on a specific index:

GET /my_index/_search 

By itself (without a request body), it returns the first 10 documents by default. This request is the same as the above one:

GET /my_index/_search
{
  "query": {
    "match_all": {}
  }
}

In Kibana's Dev Tools, the query parameter in a GET request refers to the search query that defines which documents we want to retrieve from Elasticsearch. It's part of the request body and specifies the search criteria. The query parameter essentially tells Elasticsearch "find me documents that match these conditions." It's the core part of any search request and determines which documents from our index will be returned in the response.

The query object can contain various types of queries. Common query types:

match_all - Returns all documents:

{
  "query": {
    "match_all": {}
  }
}

match - Full-text search on a specific field:

{
  "query": {
    "match": {
      "field_name": "search_term"
    }
  }
}

term - Exact term matching:

{
  "query": {
    "term": {
      "status": "active"
    }
  }
}



To find all documents written in past 1 minute:

GET my_index/_search
{
  "query": {
    "range": {
        "timestamp": {
            "gte": "now-1m",
            "lte": "now"
        }
    }
  }
}


For X days use: Xd
For X hours use: Xh

bool


bool - Combine multiple queries with logical operators:

{
  "query": {
    "bool": {
      "must": [
        {"match": {"title": "elasticsearch"}},
        {"range": {"date": {"gte": "2023-01-01"}}}
      ]
    }
  }
}


bool is the workhorse of combinational querying in Elasticsearch.
It's like the logical brain of the query DSL.

A bool query lets you combine multiple conditions (AND, OR, NOT) into one query. Think of it like this:

(bool)
 ├── must → AND conditions
 ├── filter → AND but cheap
 ├── should → OR conditions
 └── must_not → NOT conditions


Without a bool query, Elasticsearch can only run one query condition at a time. But real searches need multiple conditions.

For example:
  • logs in the last 7 days
  • log_group is X
  • success = false
  • AND NOT status=200
  • AND (message contains A OR B)

You need logical operators → that's what bool provides.

Within bool we can use filter instead of must

filter is not a standalone query. It is only one part of the bool query type.

Filters are:
  • cached
  • more efficient
  • ideal for exact and boolean conditions

Everything inside filter is combined with AND.

"filter": [
  { cond1 },
  { cond2 },
  { cond3 }
]


This means:

cond1 AND cond2 AND cond3



filter is not a query type — it is an instruction telling the bool query how to treat subqueries (as cached, non-scoring, mandatory matches). 

So this:

"filter": [...]

doesn’t make sense by itself — filter what?

Whereas this:

"bool": {
  "filter": [...]
}

...means: "Run these filter queries together inside a boolean query."

Example:

GET logs-*/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "@timestamp": {
              "gte": "now-7d",
              "lte": "now"
            }
          }
        },
        {
          "term": {
            "log_group": "/aws/lambda/my-lambda"
          }
        },
        {
          "term": {
            "mycorp.message.my-lambda.success": false
          }
        }
      ]
    }
  }
}


All must clauses must also match.

"must": [
  { cond1 },
  { cond2 }
]

Meaning:

cond1 AND cond2


Inside should, clauses are OR unless minimum_should_match makes them mandatory.

"should": [
  { cond1 },
  { cond2 }
]

Meaning:

cond1 OR cond2

must_not: All must NOT match.

"must_not": [
  { cond1 }
]

Meaning:

NOT cond1



range - Query for values within a range:

{
  "query": {
    "range": {
      "age": {
        "gte": 18,
        "lte": 65
      }
    }
  }
}


To get the number of documents in an Elasticsearch index, you can use the _count API or the _stats API.

GET /my_index/_count

This will return a response like:

{
  "count": 12345,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  }
}


To get a certain number of documents, use size argument:

GET my_index/_search?size=900

We can also use _cat API:

GET /_cat/count/my_index?v

This will return output like:

epoch      timestamp count
1718012345 10:32:25  12345


GET /my_index/_stats

"indices": {
  "my_index": {
    "primaries": {
      "docs": {
        "count": 12345,
        "deleted": 12
      }
    }
  }
}


To get the union of all values of some field e.g. channel_type field across all documents in the my_index index, we can use an Elasticsearch terms aggregation:


GET my_index/_search
{
  "size": 0, 
  "aggs": {
    "unique_channel_types": {
      "terms": {
        "field": "channel_type.keyword",
        "size": 10000  // increase if you expect many unique values
      }
    }
  }
}


Explanation:
  • "size": 0: No documents returned, just aggregation results.
  • "terms": Collects unique values.
  • "channel_type.keyword": Use .keyword to aggregate on the raw value (not analyzed text).
  • "size": 10000: Max number of buckets (unique values) to return. Adjust as needed.

Response example:

{
  "aggregations": {
    "unique_channel_types": {
      "buckets": [
        { "key": "email", "doc_count": 456 },
        { "key": "push", "doc_count": 321 },
        { "key": "sms", "doc_count": 123 }
      ]
    }
  }
}

The "key" values in the buckets array are your union of channel_type values.


Let's assume that my_index has the timestamp field (as the root field...but it can be at any path in which case we'd need to adjust the query) is correctly mapped as a date type.


To get the oldest document:

GET my_index/_search
{
  "size": 1,
  "sort": [
    { "@timestamp": "asc" }
  ]
}


To get the newest document:

GET my_index/_search
{
  "size": 1,
  "sort": [
    { "@timestamp": "desc" }
  ]
}


Sorting by a field that isn't indexed or doc_values-enabled (especially on a large or unoptimized index) can cause the memory usage to spike and the request to hang until it times out.


How to get all possible values of some field in all documents added to index in last 24 hours?

We can use Terms Aggregation with Range Query:


GET /my_index/_search
{
  "size": 0,
  "query": {
    "range": {
      "@timestamp": {
        "gte": "now-24h/h",
        "lte": "now"
      }
    }
  },
  "aggs": {
    "unique_values": {
      "terms": {
        "field": "my_field.keyword",
        "size": 10000
      }
    }
  }
}


Check number of documents which are older than N days:


POST my_index/_count
{
  "query": {
    "range": {
      "@timestamp": {
        "lt": "now-Nd/d"
      }
    }
  }
}

Delete all documents older than N days:

POST my_index/_delete_by_query?conflicts=proceed&wait_for_completion=false
{
  "query": {
    "range": {
      "@timestamp": {
        "lt": "now-Nd/d"
      }
    }
  }
}

The output of the above command is task ID:

{
  "task": "BeJWGidDTtWkRL9aQEkjhg:26425516"
}

To check all tasks, grouped by node on which they are running:

GET _tasks

The output shows task's id, action name (e.g. "indices:data/write/bulk[s][p]", "cluster:monitor/tasks/lists[n]"), type (e.g. transport, monitoring, ...).

To check the task status:

GET _tasks/<task_id>

To check if any delete_by_query task is running and number of docs deleted so far:

GET _tasks?actions=*delete/byquery&detailed=true


Once delete_by_query task is completed: deletion is done, but disk space might not yet be reclaimed. To free disk space, run a forcemerge:

POST my_index/_forcemerge?only_expunge_deletes=true

For a huge shard, consider doing this after several weekly chunks, not after every single one, to reduce I/O spikes.

Check if any forcemerge tasks are running:

GET _tasks?actions=*forcemerge
GET _tasks?actions=*forcemerge&detailed=true
GET _tasks?actions=*forcemerge&detailed=true&group_by=parents

Check number of merges:

GET my_index/_stats?level=shards



To get number of shards in index:

GET my-index/_settings/?filter_path=**.number_of_shards


To get number of replicas:

GET my-index/_settings/?filter_path=**.number_of_replicas

Output:

{
  "my-index": {
    "settings": {
      "index": {
        "number_of_replicas": "1"
      }
    }
  }
}


How to find out disk size used by some index?

GET /_cat/indices/your-index-name*?v&h=index,docs.count,store.size,pri.store.size&s=store.size:desc
  • store.size: Total size on disk (includes all primary shards and replica shards).
  • pri.store.size: Size of only the primary shards (useful for knowing the "true" data size without redundancy).
  • s=store.size:desc: Sorts the list by size (useful if you are using a wildcard *).

If you need a precise number for an automated script or a deeper dive into memory usage, use the _stats endpoint.

GET /your-index-name*/_stats/store

In the JSON response, look for:
  • total.store.size_in_bytes: The exact byte count for the whole index (primaries + replicas).
  • primaries.store.size_in_bytes: The exact byte count for just the primary data.

To see which fields take up the most space:

POST /my-index/_disk_usage

To get the frequency of writes (document ingestion) over the last 24 hours, broken down by the hour, you should use a Date Histogram aggregation. In Elasticsearch, "writing into an index" typically equates to the creation of new documents with a @timestamp.

GET /your-index-name*/_search
{
  "size": 0, 
  "query": {
    "range": {
      "@timestamp": {
        "gte": "now-24h",
        "lte": "now"
      }
    }
  },
  "aggs": {
    "writes_per_hour": {
      "date_histogram": {
        "field": "@timestamp",
        "fixed_interval": "1h",
        "extended_bounds": {
          "min": "now-24h",
          "max": "now"
        }
      }
    }
  }
}


Breakdown of the Query:
  • "size": 0: Tells Elasticsearch we don't want to see the actual documents (the "hits"), just the statistical summary.
  • range: Filters the data to only include documents from the last 24 hours.
  • date_histogram: This is the magic part. It buckets your data by time.
  • fixed_interval: "1h": Groups the results into 1-hour chunks.
  • extended_bounds: Ensures that even if an hour had zero writes, it still shows up in your list as a "0" count rather than being skipped entirely.



----

No comments:

Post a Comment