Friday 16 February 2024

Introduction to ELK Stack





What is ELK stack?

The ELK stack is a set of tools used for searching, analyzing, and visualizing large volumes of data in real-time. It is composed of three main components:

What is it used for?
  • aggregates logs from all systems and applications
  • logs analytics
  • visualizations for application and infrastructure monitoring, faster troubleshooting, security analytics etc.
image source: https://www.guru99.com/



Logstash


  • Server-side data processing pipeline that ingests (takes in) data from multiple sources simultaneously, transforms it, and then sends it to a "stash" like Elasticsearch. 
  • Supports a variety of input sources, such as:
    • log files (log shipper)
    • databases
    • message queues
  • Allows for complex data transformations and filtering
  • Helps easily transform source data and load it into Elasticsearch cluster

Logstash configuration examples:


# Sample Logstash configuration for creating a simple
# Beats -> Logstash -> Elasticsearch pipeline.

input {
        file {
                path => "/home/my-app/.pm2/logs/my-app-out.log"
                start_position => "beginning"
                sincedb_path => "/opt/logstash/sincedb-access"
        }
}

filter {
        grok {
                match => { "message" => "%{DATA:timestamp} - info: processRequestMain: my-product: (input|output) sessionid = \{%{GREEDYDATA:session_id}\} (reqXml|resXml) = %{GREEDYDATA:content_xml}" 
       }

        if "_grokparsefailure" in [tags] {
                drop { }
        }  

        xml {
                source => "content_xml"
                target => "content"
        }

        split {
                field => "content[app]"
        }

        mutate {
                add_field => {
                        "env" => "${MYAPP_ENV}"
                        "instance_id" => "${MYAPP_INSTANCE_ID}"                
                }
        }
}

output {
    amazon_es {
        hosts => [ "search-myapp-dev-af6m6cidasgqsnmskxup2fh57y.us-east-1.es.amazonaws.com" ]
        region => "us-east-1"
        index => "logstash-myapp-%{+YYYY.MM.dd}"
    }
}


Elasticsearch


  • Distributed, RESTful search and analytics engine
  • Built on Apache Lucene
  • Used for storing (it is basically a Database), searching, and analyzing large volumes of data (e.g. logs) quickly and in near real-time
  • Scalable, fast, and able to handle complex queries
  • Licensed, not open source
    • OpenSearch is open-sourced alternative (supported by AWS)
    • FluentD is another open-source data collection alternative
  • Data in the form of JSON documents is sent to Elasticsearch using:
    • API
    • Ingestion tools
      • Logstash - e.g. it's pushing logs to ElasticSearch
      • Amazon Kinesis Data Firehose
  • The original document is automatically stored and a searchable reference is added to the document in the cluster’s index
  • Elasticsearch REST-based API is used to manipulate with documents:
    • send
    • search
    • retrieve 
  • Uses schema-free JSON documents
  • Distributed system
    • Enables it to process large volumes of data in parallel, quickly finding the best matches for your queries
  • Operations such as reading or writing data usually take less than a second to complete => Elasticsearch can be used for near real-time use cases such as application monitoring and anomaly detection
  • Has support for various languages: Java, Python, PHP, JavaScript, Node.js, Ruby etc...

Kibana


  • Visualisation and reporting tool
  • Used with Elasticsearch to:
    • visualize the data
    • build interactive dashboards

Filebeat


  • https://www.elastic.co/beats/filebeat
  • log shipper
  • both Filebeat and Logstash can be used to send logs from a file-based data source to a supported output destination
  • Filebeat is a lightweight option, ideal for environments with limited resources and basic log parsing needs. Conversely, Logstash is tailored for scenarios that demand advanced log processing
  • both FB and LS can be used in tandem when building a logging pipeline with the ELK Stack because both have a different function


image source: https://www.guru99.com/




References:



No comments: