Showing posts with label Logstash. Show all posts
Showing posts with label Logstash. Show all posts

Friday, 16 February 2024

Introduction to ELK Stack





What is ELK stack?

The ELK stack is a set of tools used for searching, analyzing, and visualizing large volumes of data in real-time. It is composed of three main components:

What is it used for?
  • aggregates logs from all systems and applications
  • logs analytics
  • visualizations for application and infrastructure monitoring, faster troubleshooting, security analytics etc.
image source: https://www.guru99.com/



Logstash


  • Server-side data processing pipeline that ingests (takes in) data from multiple sources simultaneously, transforms it, and then sends it to a "stash" like Elasticsearch. 
  • Supports a variety of input sources, such as:
    • log files (log shipper)
    • databases
    • message queues
  • Allows for complex data transformations and filtering
  • Helps easily transform source data and load it into Elasticsearch cluster

Logstash configuration examples:


# Sample Logstash configuration for creating a simple
# Beats -> Logstash -> Elasticsearch pipeline.

input {
        file {
                path => "/home/my-app/.pm2/logs/my-app-out.log"
                start_position => "beginning"
                sincedb_path => "/opt/logstash/sincedb-access"
        }
}

filter {
        grok {
                match => { "message" => "%{DATA:timestamp} - info: processRequestMain: my-product: (input|output) sessionid = \{%{GREEDYDATA:session_id}\} (reqXml|resXml) = %{GREEDYDATA:content_xml}" 
       }

        if "_grokparsefailure" in [tags] {
                drop { }
        }  

        xml {
                source => "content_xml"
                target => "content"
        }

        split {
                field => "content[app]"
        }

        mutate {
                add_field => {
                        "env" => "${MYAPP_ENV}"
                        "instance_id" => "${MYAPP_INSTANCE_ID}"                
                }
        }
}

output {
    amazon_es {
        hosts => [ "search-myapp-dev-af6m6cidasgqsnmskxup2fh57y.us-east-1.es.amazonaws.com" ]
        region => "us-east-1"
        index => "logstash-myapp-%{+YYYY.MM.dd}"
    }
}


Elasticsearch


  • Distributed, RESTful search and analytics engine
  • Built on Apache Lucene
  • Used for storing (it is basically a Database), searching, and analyzing large volumes of data (e.g. logs) quickly and in near real-time
  • Scalable, fast, and able to handle complex queries
  • Licensed, not open source
    • OpenSearch is open-sourced alternative (supported by AWS)
    • FluentD is another open-source data collection alternative
  • Data in the form of JSON documents is sent to Elasticsearch using:
    • API
    • Ingestion tools
      • Logstash - e.g. it's pushing logs to ElasticSearch
      • Amazon Kinesis Data Firehose
  • The original document is automatically stored and a searchable reference is added to the document in the cluster’s index
  • Elasticsearch REST-based API is used to manipulate with documents:
    • send
    • search
    • retrieve 
  • Uses schema-free JSON documents
  • Distributed system
    • Enables it to process large volumes of data in parallel, quickly finding the best matches for your queries
  • Operations such as reading or writing data usually take less than a second to complete => Elasticsearch can be used for near real-time use cases such as application monitoring and anomaly detection
  • Has support for various languages: Java, Python, PHP, JavaScript, Node.js, Ruby etc...

Kibana


  • Visualisation and reporting tool
  • Used with Elasticsearch to:
    • visualize the data
    • build interactive dashboards

Filebeat


  • https://www.elastic.co/beats/filebeat
  • log shipper
  • both Filebeat and Logstash can be used to send logs from a file-based data source to a supported output destination
  • Filebeat is a lightweight option, ideal for environments with limited resources and basic log parsing needs. Conversely, Logstash is tailored for scenarios that demand advanced log processing
  • both FB and LS can be used in tandem when building a logging pipeline with the ELK Stack because both have a different function


image source: https://www.guru99.com/




References: