My Public Notepad: Grafana

Showing posts with label Grafana. Show all posts

Monday, 23 February 2026

Introduction to Grafana Loki

Grafana Loki:

Log aggregation system. Like Prometheus, but for logs
Repo: https://github.com/grafana/loki

These are the notes from Loki Helm chart:

***********************************************************************

Welcome to Grafana Loki

Chart version: 6.31.0

Chart Name: loki

Loki version: 3.5.0

***********************************************************************

Tip:

Watch the deployment status using the command: kubectl get pods -w --namespace grafana-loki

If pods are taking too long to schedule make sure pod affinity can be fulfilled in the current cluster.

***********************************************************************

Installed components:

***********************************************************************

* gateway

* read

* write

* backend

***********************************************************************

Sending logs to Loki

***********************************************************************

Loki has been configured with a gateway (nginx) to support reads and writes from a single component.

You can send logs from inside the cluster using the cluster DNS:

http://loki-gateway.grafana-loki.svc.cluster.local/loki/api/v1/push

You can test to send data from outside the cluster by port-forwarding the gateway to your local machine:

kubectl port-forward --namespace grafana-loki svc/loki-gateway 3100:80 &

And then using http://127.0.0.1:3100/loki/api/v1/push URL as shown below:

curl \

-H "Content-Type: application/json" \

-XPOST \

-s "http://127.0.0.1:3100/loki/api/v1/push" \

--data-raw "{\"streams\": [{\"stream\": {\"job\": \"test\"}, \"values\": [[\"$(date +%s)000000000\", \"fizzbuzz\"]]}]}" \

-H X-Scope-OrgId:foo

Then verify that Loki did receive the data using the following command:

curl "http://127.0.0.1:3100/loki/api/v1/query_range" \

--data-urlencode 'query={job="test"}' \

-H X-Scope-OrgId:foo | jq .data.result

***********************************************************************

Connecting Grafana to Loki

***********************************************************************

If Grafana operates within the cluster, you'll set up a new Loki datasource by utilizing the following URL:

http://loki-gateway.grafana-loki.svc.cluster.local/

***********************************************************************

Multi-tenancy

***********************************************************************

Loki is configured with auth enabled (multi-tenancy) and expects tenant headers (`X-Scope-OrgID`) to be set for all API calls.

You must configure Grafana's Loki datasource using the `HTTP Headers` section with the `X-Scope-OrgID` to target a specific tenant.

For each tenant, you can create a different datasource.

The agent of your choice must also be configured to propagate this header.

For example, when using Promtail you can use the `tenant` stage. https://grafana.com/docs/loki/latest/send-data/promtail/stages/tenant/

When not provided with the `X-Scope-OrgID` while auth is enabled, Loki will reject reads and writes with a 404 status code `no org id`.

You can also use a reverse proxy, to automatically add the `X-Scope-OrgID` header as suggested by https://grafana.com/docs/loki/latest/operations/authentication/

For more information, read our documentation about multi-tenancy: https://grafana.com/docs/loki/latest/operations/multi-tenancy/

> When using curl you can pass `X-Scope-OrgId` header using `-H X-Scope-OrgId:foo` option, where foo can be replaced with the tenant of your choice.

EOT -> (known after apply)

---

Friday, 20 February 2026

Grafana Observability Stack

Grafana uses these components together as an observability stack, but each has a clear role:

Loki – log database. It stores and indexes logs (especially from Kubernetes) in a cost‑efficient, label‑based way, similar to Prometheus but for logs.

Tempo – distributed tracing backend. It stores distributed traces (spans) from OpenTelemetry, Jaeger, Zipkin, etc., so you can see call flows across microservices and where latency comes from.

Mimir – Prometheus‑compatible metrics backend. It is a horizontally scalable, long‑term storage and query engine for Prometheus‑style metrics (time series).

Alloy – telemetry pipeline (collector). It is Grafana’s distribution of the OpenTelemetry Collector / Prometheus agent / Promtail ideas, used to collect, process, and forward metrics, logs, traces, profiles into Loki/Tempo/Mimir (or other backends).

How Grafana UI relates to them

Grafana UI itself is “just” the visualization and alerting layer:

It connects to Loki, Tempo, Mimir (and many others) as data sources.
For each backend you configure:

A Loki data source for logs.
A Tempo data source for traces.
A Prometheus/Mimir data source for metrics (Mimir exposes a Prometheus‑compatible API).

Grafana then lets you:

Build dashboards and alerts from Mimir metrics.
Explore logs from Loki.
Explore traces from Tempo and cross‑link them with logs/metrics (e.g., click from a log line to a trace, or from a metrics graph into logs/traces).

A useful mental model: Loki/Tempo/Mimir are databases, Alloy is the collector/router, and Grafana is the UI on top.

Are they deployed in the same Kubernetes cluster?

Common patterns:

Very common: deploy Loki, Tempo, Mimir, Alloy, and Grafana in the same Kubernetes cluster as your apps. This is the typical “in‑cluster LGTM” setup; all telemetry stays inside the cluster and traffic is simple.
Also common: run them in a separate observability cluster (or use Grafana Cloud backends), while Alloy/agents run in each workload cluster and ship data over the network. This improves isolation and makes it easier to share one observability stack across many clusters.
In smaller setups or dev environments, everything (apps + LGTM + Grafana) often lives in one cluster; in larger/regulated setups, people tend to separate “workload clusters” and an “observability cluster”.

So: they don’t have to be on the same cluster, but it’s perfectly normal (and often simplest) to run Grafana + Loki + Tempo + Mimir + Alloy together in a single Kubernetes cluster and point your apps’ telemetry to Alloy.

Why not using elasticsearch instead of loki, tempo and mimir?

Elasticsearch can replace part of what Loki, Tempo, and Mimir do, but not all of it, and usually with higher cost/complexity for cloud‑native observability.

1. Scope: logs vs full observability

Elasticsearch is a general search and analytics engine that’s great at full‑text search, aggregations, and analytics over documents (including logs).

The LGTM stack is explicitly split by signal:

Loki → logs
Tempo → traces
Mimir → metrics

Each is optimized only for its signal type and integrates tightly with Grafana and modern telemetry standards.

You could plausibly replace Loki with Elasticsearch for logs, but Elasticsearch does not natively replace Tempo (distributed tracing backend) or Mimir (Prometheus‑compatible metrics backend).

2. Logs: Loki vs Elasticsearch

Elasticsearch strengths:

Very powerful full‑text search, fuzzy matching, relevance scoring, complex aggregations.
Good when you need deep forensic search and advanced analytics on log text.

Loki strengths:

Stores logs as compressed chunks plus a small label index, so storage and compute are much cheaper than Elasticsearch for typical Kubernetes logs.
Very tight integration with Grafana and the rest of LGTM, and simple, label‑based querying.

Trade‑off: Elasticsearch gives richer search at a high infra + ops cost, Loki gives “good enough” search for operational troubleshooting with much lower cost and operational burden.

3. Traces and metrics: Tempo & Mimir vs “just ES”

Tempo:

Implements distributed tracing concepts (spans, traces, service graphs) and OpenTelemetry/Jaeger/Zipkin protocols; the data model and APIs are specialized for traces.
Elasticsearch can store trace‑like JSON documents, but you’d have to build/maintain all the trace stitching, UI navigation, and integrations yourself.

Mimir:

Is a horizontally scalable, Prometheus‑compatible time‑series database, with native remote‑write/read and PromQL semantics.
Elasticsearch can store time‑stamped metrics, but you lose Prometheus compatibility, PromQL semantics, and the whole ecosystem that expects a Prometheus‑style API.

So using only Elasticsearch means you’re giving up the standard metrics and tracing ecosystems and rebuilding a lot of tooling on top of a generic search engine.

4. Cost, complexity, and operational burden

Elasticsearch clusters generally need:

More RAM/CPU per node, careful shard and index management, and capacity planning.
Storage overhead from full‑text indexes (often 1.5–3× raw log size plus replicas).

Loki/Tempo/Mimir:

Are designed for object storage, compression, and label‑only indexing, which dramatically lowers storage and compute requirements for logs and metrics.
Have simpler, well‑documented reference architectures specifically for observability.

For a modern Kubernetes‑centric environment, that usually makes LGTM cheaper and easier to run than a single big Elasticsearch cluster for everything.

5. When Elasticsearch still makes sense

You might still choose Elasticsearch (often with Kibana/APM) if:

You already have a strong ELK stack and team expertise.
Your primary need is deep, flexible text search and analytics over logs, with less emphasis on Prometheus/OTel ecosystems.
You want Elasticsearch’s ML/anomaly‑detection features and are willing to pay the operational cost.

But if your goal is a Grafana‑centric, standards‑based (Prometheus + OpenTelemetry) observability platform, LGTM (Loki+Tempo+Mimir, plus Alloy as collector) is a better fit than trying to push everything into Elasticsearch.

---

Tuesday, 14 May 2024

Introduction to Grafana

What is Grafana?

Web application for:

analytics
interactive visualization - often a component in monitoring stacks in combination with:

time series databases:

InfluxDB
Prometheus
Graphite

monitoring platforms:

Sensu
Icinga
Checkmk
Zabbix
Netdata
PRTG

SIEMs (Security Information and Event Management - collects logs and events, normalizing this data for further analysis that can manifest as visualizations, alerts, searches, reports, and more.):

Elasticsearch
Splunk

other data sources.

Produces charts, graphs, and alerts for the web when connected to supported data sources
Multi-platform

Microsoft Windows
Linux
macOS

Licenses:

open source
licensed Grafana Enterprise

additional capabilities
sold as a self-hosted installation or through an account on the Grafana Labs cloud service

Expandable through a plug-in system
Complex monitoring dashboards can be built via interactive query builders

How to start with Grafana Web Application?

Grafana web app shows a list of:

Dashboards

for data visualization
can be grouped into folders

Playlists

groups of dashboards that are displayed in a sequence
they can be used to cycle dashboards on TVs without user control

Snapshots

interactive, publicly available, point-in-time representations of dashboards

Library panels

Reusable panels that can be added to multiple dashboards

How to create a new Dashboard?

We can add a visualisation by selecting a data source and then querying and visualising data with charts, stats and tables or by creating lists, markdowns and other widgets.

There is also a drop-down menu in the context of the dashboard, with the same content:

Adding a visualization actually adds a new panel:

We can toggle a Table view and see data points as rows in a table instead of the graph:

In the right-hand side panel we can choose Visualisation type:

For example, Bar chart would look like this:

Suggestions tab show thumbnails for various visualisations:

Related panels can be grouped into rows.

How to use Amazon CloudWatch as Grafana data source?

Grafana admin can create a new Amazon CloudWatch data source by specifying the following:

Make sure that IAM user that Grafana will be using to authenticate to AWS has a proper access policy attached as per Amazon CloudWatch data source | Grafana documentation.