Introduction to ELK Stack

Friday, 16 February 2024

Introduction to ELK Stack

What is ELK stack?

The ELK stack is a set of tools used for searching, analyzing, and visualizing large volumes of data in real-time. It is composed of three main components:

Elasticsearch [https://www.elastic.co/elasticsearch]
Logstash [https://www.elastic.co/logstash]
Kibana [https://www.elastic.co/kibana]

What is it used for?

aggregates logs from all systems and applications
logs analytics
visualizations for application and infrastructure monitoring, faster troubleshooting, security analytics etc.

image source: https://www.guru99.com/

Logstash

Server-side data processing pipeline that ingests (takes in) data from multiple sources simultaneously, transforms it, and then sends it to a "stash" like Elasticsearch.
Supports a variety of input sources, such as:

log files (log shipper)
databases
message queues

Allows for complex data transformations and filtering
Helps easily transform source data and load it into Elasticsearch cluster

Logstash configuration examples:

# Sample Logstash configuration for creating a simple

# Beats -> Logstash -> Elasticsearch pipeline.

input {

file {

path => "/home/my-app/.pm2/logs/my-app-out.log"

start_position => "beginning"

sincedb_path => "/opt/logstash/sincedb-access"

}

filter {

grok {

match => { "message" => "%{DATA:timestamp} - info: processRequestMain: my-product: (input|output) sessionid = \{%{GREEDYDATA:session_id}\} (reqXml|resXml) = %{GREEDYDATA:content_xml}"

}

if "_grokparsefailure" in [tags] {

drop { }

}

xml {

source => "content_xml"

target => "content"

}

split {

field => "content[app]"

}

mutate {

add_field => {

"env" => "${MYAPP_ENV}"

"instance_id" => "${MYAPP_INSTANCE_ID}"

}

output {

amazon_es {

hosts => [ "search-myapp-dev-af6m6cidasgqsnmskxup2fh57y.us-east-1.es.amazonaws.com" ]

region => "us-east-1"

index => "logstash-myapp-%{+YYYY.MM.dd}"

}

Elasticsearch

Distributed, RESTful search and analytics engine
Built on Apache Lucene
Used for storing (it is basically a Database), searching, and analyzing large volumes of data (e.g. logs) quickly and in near real-time
Scalable, fast, and able to handle complex queries
Licensed, not open source

OpenSearch is open-sourced alternative (supported by AWS)
FluentD is another open-source data collection alternative

Data in the form of JSON documents is sent to Elasticsearch using:

API
Ingestion tools

Logstash - e.g. it's pushing logs to ElasticSearch
Amazon Kinesis Data Firehose

The original document is automatically stored and a searchable reference is added to the document in the cluster’s index
Elasticsearch REST-based API is used to manipulate with documents:

send
search
retrieve

Uses schema-free JSON documents
Distributed system

Enables it to process large volumes of data in parallel, quickly finding the best matches for your queries

Operations such as reading or writing data usually take less than a second to complete => Elasticsearch can be used for near real-time use cases such as application monitoring and anomaly detection
Has support for various languages: Java, Python, PHP, JavaScript, Node.js, Ruby etc...

Kibana

Visualisation and reporting tool
Used with Elasticsearch to:

visualize the data
build interactive dashboards

Filebeat

https://www.elastic.co/beats/filebeat
log shipper
both Filebeat and Logstash can be used to send logs from a file-based data source to a supported output destination
Filebeat is a lightweight option, ideal for environments with limited resources and basic log parsing needs. Conversely, Logstash is tailored for scenarios that demand advanced log processing
both FB and LS can be used in tandem when building a logging pipeline with the ELK Stack because both have a different function