Friday, 16 February 2024

Introduction to ELK Stack

What is ELK stack?

The ELK stack is a set of tools used for searching, analyzing, and visualizing large volumes of data in real-time. It is composed of three main components:

What is it used for?
  • aggregates logs from all systems and applications
  • logs analytics
  • visualizations for application and infrastructure monitoring, faster troubleshooting, security analytics etc.
image source:


  • Server-side data processing pipeline that ingests (takes in) data from multiple sources simultaneously, transforms it, and then sends it to a "stash" like Elasticsearch. 
  • Supports a variety of input sources, such as:
    • log files (log shipper)
    • databases
    • message queues
  • Allows for complex data transformations and filtering
  • Helps easily transform source data and load it into Elasticsearch cluster

Logstash configuration examples:

# Sample Logstash configuration for creating a simple
# Beats -> Logstash -> Elasticsearch pipeline.

input {
        file {
                path => "/home/my-app/.pm2/logs/my-app-out.log"
                start_position => "beginning"
                sincedb_path => "/opt/logstash/sincedb-access"

filter {
        grok {
                match => { "message" => "%{DATA:timestamp} - info: processRequestMain: my-product: (input|output) sessionid = \{%{GREEDYDATA:session_id}\} (reqXml|resXml) = %{GREEDYDATA:content_xml}" 

        if "_grokparsefailure" in [tags] {
                drop { }

        xml {
                source => "content_xml"
                target => "content"

        split {
                field => "content[app]"

        mutate {
                add_field => {
                        "env" => "${MYAPP_ENV}"
                        "instance_id" => "${MYAPP_INSTANCE_ID}"                

output {
    amazon_es {
        hosts => [ "" ]
        region => "us-east-1"
        index => "logstash-myapp-%{+YYYY.MM.dd}"


  • Distributed, RESTful search and analytics engine
  • Built on Apache Lucene
  • Used for storing (it is basically a Database), searching, and analyzing large volumes of data (e.g. logs) quickly and in near real-time
  • Scalable, fast, and able to handle complex queries
  • Licensed, not open source
    • OpenSearch is open-sourced alternative (supported by AWS)
    • FluentD is another open-source data collection alternative
  • Data in the form of JSON documents is sent to Elasticsearch using:
    • API
    • Ingestion tools
      • Logstash - e.g. it's pushing logs to ElasticSearch
      • Amazon Kinesis Data Firehose
  • The original document is automatically stored and a searchable reference is added to the document in the cluster’s index
  • Elasticsearch REST-based API is used to manipulate with documents:
    • send
    • search
    • retrieve 
  • Uses schema-free JSON documents
  • Distributed system
    • Enables it to process large volumes of data in parallel, quickly finding the best matches for your queries
  • Operations such as reading or writing data usually take less than a second to complete => Elasticsearch can be used for near real-time use cases such as application monitoring and anomaly detection
  • Has support for various languages: Java, Python, PHP, JavaScript, Node.js, Ruby etc...


  • Visualisation and reporting tool
  • Used with Elasticsearch to:
    • visualize the data
    • build interactive dashboards


  • log shipper
  • both Filebeat and Logstash can be used to send logs from a file-based data source to a supported output destination
  • Filebeat is a lightweight option, ideal for environments with limited resources and basic log parsing needs. Conversely, Logstash is tailored for scenarios that demand advanced log processing
  • both FB and LS can be used in tandem when building a logging pipeline with the ELK Stack because both have a different function

image source:


No comments: