Thursday 29 February 2024

Amazon Elastic Block Store (EBS)

 

Amazon Elastic Block Store (EBS)




EBS is block storage that can be attached to an AWS instance and used as a virtual hard disk. An EBS volume can be up to 16TB in size.

  • Part of EC2 ecosystem
  • Manages 3 entities:
    • Volumes
    • Snapshots
    • Lifecycle Manager
  • system storage for AWS EC2 VMs
  • reduces risk
  • durable
  • secure
  • avoid risks of physical media handling
  • 2 types:
    • Solid State Drive (SSD) - backed:
      • general purpose
      • provisioned IOPS
    • Hard Disk Drive (HDD) - backed:
      • Throughput optimized
      • Cold
  • EBS can be attached only to EC2 instance which is in the same Availabilty Zone [amazon web services - Is it possible to change the EBS volume to different availability zones? - Server Fault]
  • Multi-Attach feature allows EC2 instances to share a single EBS volume for up to 16 instances and provide higher availability of your applications for Linux workloads

Data is broken down into blocks and stored as a separate piece. Each block has unique ID.  
Only a single EC2 instance, in a single AZ can access data on EBS.

When we're launching a new EC2 instance, we need to specify the storage for the:
  •  root volume
    • Contains the image used to boot the instance
    • Each instance has a single root volume
  • (optionally) more storage volumes
    • They can be added to EC2 instances when they are launched or after they are running
These volumes are basically "hard disks" which are used to persistently store OS and (our) applications, between (EC2) virtual machine restarts.





Storage type
The storage type used for the volume.

EBS volumes are block-level storage volumes that persist independently from the lifetime of an EC2 instance, so you can stop and restart your instance at a later time without losing your data. You can also detach an EBS volume from one instance and attach it to another instance. EBS volumes are billed separately from the instance’s usage cost.

Instance store volumes are physically attached to the host computer. These volumes provide temporary block storage that persists only during the lifetime of the instance. If you stop, hibernate, or terminate an instance, data on instance store volumes is lost. The instance type determines the size and number of the instance store volumes available and the type of hardware used for the instance store volumes. Instance store volumes are included as part of the instance's usage cost.

Device name
The available device names for the volume.

The device name that you assign is used by Amazon EC2. The block device driver for the instance assigns the actual volume name when mounting the volume. The volume name assigned by the block device driver might differ from the device name that you assign.

The device names that you're allowed to assign depends on the virtualization type of the selected instance.

Snapshot
The snapshot from which to create the volume. A snapshot is a point-in-time backup of an EBS volume.

When you create a new volume from a snapshot, it's an exact copy of the original volume at the time the snapshot was taken.

EBS volumes created from encrypted snapshots are automatically encrypted and you can’t change their encryption status. EBS volumes created from unencrypted snapshots can be optionally encrypted.

Size (GiB)
The size of the volume, in GiB.

If you are creating the volume from a snapshot, then the size of the volume can’t be smaller than the size of the snapshot.

Supported volume sizes are as follows:
io1: 4 GiB to 16,384 GiB
io2: 4 GiB to 65,536 GiB
gp2 and gp3: 1 GiB to 16,384 GiB
st1 and sc1: 125 GiB to 16,384 GiB
Magnetic (standard): 1 GiB to 1024 GiB


Volume type
The type of volume to attach. Volume types include:
  • General Purpose SSD (gp2 and gp3) volumes offer cost-effective storage that is ideal for a broad range of workloads.
  • Provisioned IOPS SSD (io1 and io2) volumes provide low latency and are designed to meet the needs of I/O-intensive workloads. They are best for EBS-optimized instances.
  • Throughput Optimized HDD (st1) volumes provide low-cost magnetic storage that is a good fit for large, sequential workloads.
  • Cold HDD (sc1) volumes provide low-cost magnetic storage that offers lower throughput than st1. sc1 is a good fit for large, sequential cold-data workloads that require infrequent access to data.
  • Magnetic (standard) volumes are best suited for workloads where data is accessed infrequently.
IOPS
The requested number of I/O operations per second that the volume can support.

It is applicable to Provisioned IOPS SSD (io1 and io2) and General Purpose SSD (gp2 and gp3) volumes only.

Provisioned IOPS SSD (io1 and io2) io1 volumes support between 100 and 64,000 IOPS, and io2 volumes support between 100 and 256,000 IOPS depending on the volume size. For io1 volumes, you can provision up to 50 IOPS per GiB. For io2 volumes, you can provision up to 1000 IOPS per GiB.

For General Purpose SSD (gp2) volumes, baseline performance scales linearly at 3 IOPS per GiB from a minimum of 100 IOPS (at 33.33 GiB and below) to a maximum of 16,000 IOPS (at 5,334 GiB and above). General Purpose SSD (gp3) volumes support a baseline of 3,000 IOPS. Additionally, you can provision up to 500 IOPS per GiB up to a maximum of 16,000 IOPS.

Magnetic (standard) volumes deliver approximately 100 IOPS on average, with a burst capability of up to hundreds of IOPS.

For Throughput Optimized HDD (st1) and Cold HDD (sc1) volumes, performance is measured in throughput (MiB/s).

Delete on termination
Indicates whether the volume should be automatically deleted when the instance is terminated.

If you disable this feature, the volume will persist independently from the running life of an EC2 instance. When you terminate the instance, the volume will remain provisioned in your account. If you no longer need the volume after the instance has been terminated, you must delete it manually.

You can also change the delete on termination behavior after the instance has been launched.

Encrypted
The encryption status of the volume.

Amazon EBS encryption is an encryption solution for your EBS volumes. Amazon EBS encryption uses AWS KMS keys to encrypt volumes.

Considerations:
  • If your account is enabled for encryption by default, you can't create unencrypted volumes.
  • If you selected an encrypted snapshot, the volume is automatically encrypted.
  • If your account is not enabled for encryption by default, and you did not select a snapshot or you selected an unencrypted snapshot, encryption is optional.
  • You can create an encrypted io2 volumes in any size and IOPS configuration. However, to create an encrypted volume that has a size greater than 16 TiB, or IOPS greater than 64,000 from an unencrypted snapshot, or a shared encrypted snapshot from an unencrypted snapshot, you must first create an encrypted snapshot in your account and then use that snapshot to create the volume.

KMS key
The KMS key that will be used to encrypt the volume.

Amazon EBS encryption uses AWS KMS keys when creating encrypted volumes and snapshots. EBS encrypts your volume with a data key using the industry-standard AES-256 algorithm. Your data key is stored on disk with your encrypted data, but not before EBS encrypts it with your KMS key. Your data key never appears on disk in plaintext. The same data key is shared by snapshots of the volume and any subsequent volumes created from those snapshots.

Throughput
Throughput that the volume can support specified for Streaming Optimized volumes


If we click on "Add new volume", Volume 2 (Custom) section appears:




EBS Volume Lifecycle



Here is the EBS Volume state diagram:

credit: View information about an Amazon EBS volume - Amazon EBS




Creating a volume snapshot


Why do we want to create an EBS volume snapshot?

If we terminate (intentionally or not) the EC2 instance, the root EBS volume (which might be the only one used by that EC2 instance) will be deleted:


If we take a snapshot of the root EBS volume, then we'll be able later to restore that EC2 instance.






Create a point-in-time snapshot to back up the data on an Amazon EBS volume to Amazon S3.

You can back up the data on your Amazon EBS volumes to Amazon S3 by taking point-in-time snapshots. Snapshots are incremental backups, which means that only the blocks on the device that have changed since the last snapshot are backed up. Each snapshot that you create contains all of the information that is needed to fully restore an EBS volume.

When you create a snapshot, only data that has already been written to the volume is backed up. This might exclude data that has been cached by any applications or the operating system. To ensure a consistent and complete snapshot, we recommend that you pause write operations to the volume or that you unmount the volume from the instance before creating the snapshot.

Snapshots that are taken from encrypted volumes are automatically encrypted. Volumes that are created from encrypted snapshots are also automatically encrypted.


---

References:


Tuesday 27 February 2024

Introduction to MySQL DB

 



To connect to MySQL instance:

% /opt/homebrew/opt/mysql-client/bin/mysql \    
-u USER \
-pPASS \
-h DB_HOST \
DB_NAME
 
mysql: [Warning] Using a password on the command line interface can be insecure.
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1198
Server version: 8.0.35 Source distribution
Copyright (c) 2000, 2023, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> 



To show all databases in this MySQL server:

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| my_wordpress    |
| information_schema |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
5 rows in set (0.09 sec)



To change the current database:

mysql> use my_wordpress;
Database changed


To list all columns in some table:

mysql> describe my_table;
+------------+---------------------+------+-----+---------------------+----------------+
| Field      | Type                | Null | Key | Default             | Extra          |
+------------+---------------------+------+-----+---------------------+----------------+
| id         | bigint(20) unsigned | NO   | PRI | NULL                | auto_increment 
| post_id    | bigint(20) unsigned | NO   | MUL | 0                   |                
| post_type  | varchar(20)         | NO   |     | post                |                
| created_at | datetime            | NO   |     | 0000-00-00 00:00:00 |                
| author_id  | bigint(20) unsigned | NO   | MUL | 0                   |                
| new        | tinyint(1)          | YES  |     | 0                   |                
+------------+---------------------+------+-----+---------------------+----------------+
6 rows in set (0.003 sec)

Another way to find out the names of all columns:

mysql> select COLUMN_NAME from INFORMATION_SCHEMA.COLUMNS where TABLE_NAME='wp_postmeta';
+-------------+
| COLUMN_NAME |
+-------------+
| meta_id     |
| post_id     |
| meta_key    |
| meta_value  |
+-------------+
4 rows in set (0.08 sec)

Note that columns returned might not be listed in the order that they are in the table. On some other MySQL instance, the same query returned the columns listed in different order:

MySQL [my_db]> select COLUMN_NAME from INFORMATION_SCHEMA.COLUMNS where TABLE_NAME='wp_postmeta';
+-------------+
| COLUMN_NAME |
+-------------+
| meta_id     |
| meta_key    |
| meta_value  |
| post_id     |
+-------------+
4 rows in set (0.002 sec)

To make sure you're using the right values for the right column, the best is to see how e.g. 1st row looks like:

MySQL [my_db]> select * from wp_postmeta where meta_key like '%custom_string_%' LIMIT 1;
+---------+---------+---------------------------+------------+
| meta_id | post_id | meta_key                  | meta_value |
+---------+---------+---------------------------+------------+
|  225532 |    2289 | _custom_string_enabled    | 0          |
+---------+---------+---------------------------+------------+
1 row in set (0.034 sec)

To list all records (rows) that contain field/attribute value that ends with some string e.g. "origin" (% means any character):
 
mysql> select * from wp_postmeta where meta_key like '%origin';

+---------+---------+-----------------------------+---------------------+
| meta_id | post_id | meta_key                    | meta_value          |
+---------+---------+-----------------------------+---------------------+
| 1085763 |    8845 | origin  |                     |
| 1085764 |    8845 | _origin | field_5d99d579566f1 |
...
| 5836494 |   88486 | origin  | 1                   |
| 5836495 |   88486 | _origin | field_5d99d579566f1 |
+---------+---------+-----------------------------+---------------------+
2980 rows in set (2.08 sec)

Sometimes we're interested only in the number of returned rows:

mysql> select count(*) from wp_postmeta where meta_key like '%origin';
+----------+
| count(*) |
+----------+
|     6444 |
+----------+
1 row in set (1.364 sec)


To delete rows above:

mysql> delete from wp_postmeta where meta_key like '%origin';

Query OK, 2980 rows affected (2.81 sec)


After deleting rows from a table, it's recommended to optimize the affected table:

mysql> optimize table wp_postmeta;
+-----------------------------+----------+----------+-------------------------------------------------------------------+
| Table                       | Op       | Msg_type | Msg_text                                                          |
+-----------------------------+----------+----------+-------------------------------------------------------------------+
| my_wordpress.wp_postmeta | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| my_wordpress.wp_postmeta | optimize | status   | OK                                                                |
+-----------------------------+----------+----------+-------------------------------------------------------------------+
2 rows in set (34.30 sec)


To exit from the interactive terminal:

mysql> exit
Bye


To find out which tables contain some string:

% /opt/homebrew/opt/mysql-client/bin/mysqldump \
-u USER \
-pPASS \
-h DB_HOST \
--no-create-info \
--extended-insert=FALSE \
DB_NAME | grep STRING  > dump_STRING.txt

mysqldump: [Warning] Using a password on the command line interface can be insecure.
Warning: A partial dump from a server that has GTIDs will by default include the GTIDs of all transactions, even those that changed suppressed parts of the database. If you don't want to restore GTIDs, pass --set-gtid-purged=OFF. To make a complete dump, pass --all-databases --triggers --routines --events. 
Warning: A dump from a server that has GTIDs enabled will by default include the GTIDs of all transactions, even those that were executed during its extraction and might not be represented in the dumped data. This might result in an inconsistent data dump. 
In order to ensure a consistent backup of the database, pass --single-transaction or --lock-all-tables or --source-data.
 



---

Monday 26 February 2024

Introduction to Amazon Elastic Container Service (ECS)





  • container management service
  • highly scalable and fast
  • makes it easy to run, stop and manage containers on a cluster
  • integrated with the AWS Fargate serverless compute engine which automatically provisions and manages Amazon EC2 instances

AWS ECS is organised in the following groups:
  • Clusters
  • Namespaces
  • Task definitions

Clusters


A cluster is a logical grouping of services or standalone tasks.

The cluster list view provides a snapshot of the status of each of your clusters. This view displays the number of active services and the deployment status of all tasks within the cluster.


Namespaces


A namespace groups together Amazon ECS services to configure common connectivity. Amazon ECS can manage namespaces in AWS Cloud Map on your behalf.

The namespace list view provides a snapshot of each of your namespaces. This view displays the namespace ID in AWS Cloud Map, the short name of the namespace, and the date that it was created.

Use namespaces to correlate Amazon ECS services that connect to each other. Each service can be in a single namespace. A service can be in the default namespace configured in the cluster, or specify a different namespace. The namespace must be in the same AWS Region as the Amazon ECS service and cluster. The type of namespace in AWS Cloud Map doesn't affect Service Connect.

Amazon ECS can create a namespace as you create a cluster, or you can assign a default namespace to an existing cluster at any time. Services that you create in these clusters can connect to the other services in the namespace without additional configuration. Additional configuration of a domain name and port is required when you want to make a service available for your other services to connect to.

Task definitions


The Task definitions view lists each task definition family you've created.
You can perform the following actions:
- Deploy the task definition as a service or a task.
- Create a new revision


Creating a new Task definition




Family (Info):

A task definition family is used to group multiple versions, also referred to as revisions, of the same task definition. The first task definition that is registered into a particular family is given a revision of 1, and any task definitions registered after that are given a sequential revision number.


Launch type (Info):

The Launch type specified for a task definition determines where Amazon ECS launches the task or service. The task definition parameters are validated against the allowed values for the launch type.

By default, the AWS Fargate option is selected. You can also select Amazon EC2 instances.

Amazon ECS returns an error if the task definition is not valid for use on the infrastructure type specified when creating a service or running a task.

This field corresponds to the requiresCompatibilities task definition parameter.

Operating system/Architecture (Info):

The Operating system/Architecture configuration for the task definition defines the operating system and the CPU architecture that your tasks run on. When you have multiple tasks that are part of a service, the tasks must all have the same configuration for this option. To use the 64-bit ARM CPU architecture, select Linux/ARM64.


Network mode (Info):

The network mode specifies what type of networking the containers in the task use. The following are available:
  • The awsvpc network mode, which provides the task with an elastic network interface (ENI). When creating a service or running a task with this network mode you must specify a network configuration consisting of one or more subnets, security groups, and whether to assign the task a public IP address.
    • The awsvpc network mode is required for tasks hosted on Fargate.
  • The bridge network mode uses Docker's built-in virtual network, which runs inside each Amazon EC2 instance hosting the task. The bridge is an internal network namespace that allows each container connected to the same bridge network to communicate with each other. It provides an isolation boundary from containers that aren't connected to the same bridge network. You use static or dynamic port mappings to map ports in the container with ports on the Amazon EC2 host.
    • If you choose bridge for the network mode, under Port mappings, for Host port, specify the port number on the container instance to reserve for your container.
  • The default mode uses Docker's built-in virtual network mode on Windows, which runs inside each Amazon EC2 instance that hosts the task. This is the default network mode on Windows if a network mode isn't specified in the task definition.
  • The host network mode has the task bypass Docker's built-in virtual network and maps container ports directly to the ENI of the Amazon EC2 instance hosting the task. As a result, you can't run multiple instantiations of the same task on a single Amazon EC2 instance when port mappings are used.
  • The none network mode provides a task with no external network connectivity.
For tasks hosted on Amazon EC2 instances, the available network modes are awsvpc, bridge, host, and none. If no network mode is specified, the bridge network mode is used by default.


Task size

For task size, specify the amount of CPU and memory to reserve for the task. The CPU value is specified as a number of vCPUs. The memory value is specified in GB.

For Amazon ECS tasks hosted on AWS Fargate, the task CPU and memory values are required and there are specific values for both CPU and memory that are supported.

  • For .25 vCPU CPU, the valid memory values are .5 GB, 1 GB, or 2 GB.
  • For .5 vCPU, the valid memory values are 1 GB, 2 GB, 3 GB, or 4 GB.
  • For 1 vCPU, the valid memory values are 2 GB, 3 GB, 4 GB, 5 GB, 6 GB, 7 GB, or 8 GB.
  • For 2 vCPU, the valid memory values are between 4 GB and 16 GB in 1 GB increments.
  • For 4 vCPU, the valid memory values are between 8 GB and 30 GB in 1 GB increments.
  • For 8 vCPU, the valid memory values are between 16 GB and 60 GB in 4 GB increments. This option requires Linux platform 1.4.0 or later.
  • For 16 vCPU, the valid memory values are between 32GB and 120 GB in 8 GB increments.
  • This option requires Linux platform 1.4.0 or later.

For Amazon ECS tasks hosted on Amazon EC2 instances, the task size fields are optional. If your cluster doesn't have any registered container instances with the requested CPU units available, the task fails. Supported values are between 128 CPU units (0.125 vCPUs) and 10240 CPU units (10 vCPUs). To specify the memory value in GB, enter GB after the value. For example, to set the Memory value to 3GB, enter 3GB.

Task role

The task role is an IAM role that is used by containers in a task to make AWS API calls on your behalf. Applications must sign their AWS API requests with AWS credentials, and a task role provides a strategy for managing credentials for your applications to use, similar to the way that Amazon EC2 instance profiles provide credentials to Amazon EC2 instances.

A task IAM role is required when using the AWS Distro for OpenTelemetry integration to collect trace data or metrics.


Task execution role

The task execution role is an IAM role that grants the Amazon ECS container and Fargate agents permission to make AWS API calls on your behalf.

To use the task execution role, you must run container agent 1.16.0 or later.

The following are common use cases for a task execution AWS Identity and Access Management (IAM) role:
  • Your task is hosted on AWS Fargate or on an external instance and it does the following:
    • Pulls a container image from an Amazon ECR private repository.
    • Sends container logs to Amazon CloudWatch Logs by using the awslogs log driver.
  • Your tasks are hosted on either AWS Fargate or Amazon EC2 instances and they do the following:
    • Use private registry authentication.
    • Reference sensitive data in the task definition by using AWS Secrets Manager secrets or AWS Systems Manager Parameter Store parameters.



Container

A container definition provides details and resource requirements for a container that is passed to the Docker daemon. A task definition may contain one or more container definitions.

For applications that require multiple containers, you should group the containers in the same task definition under the following conditions.
  • If the containers share a common lifecycle. For example, if they must launch or be terminated together.
  • If the containers must share the same resources or data volumes.
  • If the containers must run on the same underlying host. For example, if one container references the other on a localhost port.
Image URI e.g. 






Command can be used for executing an entrypoint, for example: /usr/local/my-app/entrypoint.sh






Deregister

When a task definition revision is deregistered, the revision transitions to an INACTIVE state. Existing tasks and services that use the inactive task definition revision continue to run without disruption.
Inactive revisions can't be used to run new tasks or create new services, and you can't update an existing service to use an inactive revision.
Are you sure you want to deregister the following task definition:revision?



Creating a cluster





Infrastructure (Info):

From the Infrastructure workflow, you can configure the infrastructure where your containers run.

The valid options are:

  • AWS Fargate
    • Fargate is a serverless, pay-as-you-go compute engine. With Fargate you don't need to manage servers, handle capacity planning, or isolate container workloads for security.
  • Amazon EC2 instances
    • You choose the instance type, the number of instances, and manage the capacity.
  • External instances using ECS Anywhere
    • Amazon ECS Anywhere provides support for registering an external instance such as an on-premises server or virtual machine (VM), to your Amazon ECS cluster.

By default, when you create a cluster, the cluster is configured for AWS Fargate.

To use EC2 instances, clear AWS Fargate and select Amazon EC2 instances. When you add EC2 instances, you can use an existing group, or create a new Auto Scaling group to act as the capacity provider.

To use your on-premises servers, clear AWS Fargate and select External instances using ECS Anywhere. When the cluster creation is complete, go to Cluster details page to generate the registration command for your external instances, and then run the command on all your external instances.

Monitoring (Info):

From the Monitoring workflow, you can turn on CloudWatch Container Insights.

CloudWatch Container Insights comes at an additional cost and is a fully managed service. It automatically collects, aggregates, and summarizes Amazon ECS metrics and logs. It provides the following information for clusters and services with tasks in the RUNNING state:
  • CPU and memory utilization
  • The number of task and services
  • Read and write storage
  • Network transmit and receive rates (for tasks that use the bridge or awsvpc network mode)
  • Container instance counts for clusters, services, and tasks
You can view the metrics in the CloudWatch Container Insights dashboard and perform the following operations:
  • Query and analyze container application logs by integrating with CloudWatch Container Insights logs.
  • Create CloudWatch alarms so that you can track issues.

Inside the cluster, we create a service which runs a task defined via selected task definition (in Deployment configuration >> Task definition >> Family):














This is a list of all AWS resources involved in running a (Docker) container in one ECS cluster:

  • Task definition
    • Revisions
    • Task role
      • policies: ...
        • resources: ...
    • Task execution role
    • Containers
      • Container
        • Image. This is a Amazon Elastic Container Registry (Amazon ECR) image URI e.g. 03623477220.dkr.ecr.us-east-1.amazonaws.com/my-app:36 (36 is its tag which is usually a docker image version)
          • In Amazon ECR >> Private registry >> Repositories: my-app
        • Log configuration
          • awslogs-group (In CloudWatch >> Log groups)
  • Cluster - contains services
    • Service
      • Task definition: revision <-- this is the link between a Cluster and Task definition
      • Network (VPC)
      • Subnets
      • Security groups
      • Service role e.g. AWSServiceRoleForECS (AWS-defined role.  Amazon ECS uses the service-linked role named AWSServiceRoleForECS to enable Amazon ECS Service to call AWS APIs on your behalf.)
      • Load balancers
  • Namespace
---

References:


Thursday 22 February 2024

Introduction to AWS CloudFront


 

What is AWS CloudFront?

  • Content delivery network (CDN) provided by AWS

Why to use it?

  • To speed up delivery of web content (dynamic, static, streaming, interactive)
  • Content is distributed with low latency and high data transfer speeds

How does it work?

  • Files are delivered to end-users using a global network of edge locations
  • Users who request web content are automatically routed to the edge location that gives them the lowest latency.

How to set it for selected content?

  • Create a distribution and specify settings for it
    • Amazon S3 bucket or HTTP server that we want CloudFront to get the content from
    • whether we want only selected users to have access to that content
    • whether we want users to use HTTPS
    • Alternate domain name (CNAME). This optional setting is a custom domain name that we use in URLs for the files served by this distribution. Example: my-content.example.com.
  • CloudFront then assigns a domain name to the distribution (e.g. abcdef0123456.cloudfront.net) but it's possible to use custom domain name (e.g. example.com)
  • We can now access our resource via URL:
    • http://abcdef0123456.cloudfront.net/index.html or
    • http://example.com/index.html
  • ...
CloudFront >> Distributions are not region-specific, they are global.

Friday 16 February 2024

Introduction to ELK Stack





What is ELK stack?

The ELK stack is a set of tools used for searching, analyzing, and visualizing large volumes of data in real-time. It is composed of three main components:

What is it used for?
  • aggregates logs from all systems and applications
  • logs analytics
  • visualizations for application and infrastructure monitoring, faster troubleshooting, security analytics etc.
image source: https://www.guru99.com/



Logstash


  • Server-side data processing pipeline that ingests (takes in) data from multiple sources simultaneously, transforms it, and then sends it to a "stash" like Elasticsearch. 
  • Supports a variety of input sources, such as:
    • log files (log shipper)
    • databases
    • message queues
  • Allows for complex data transformations and filtering
  • Helps easily transform source data and load it into Elasticsearch cluster

Logstash configuration examples:


# Sample Logstash configuration for creating a simple
# Beats -> Logstash -> Elasticsearch pipeline.

input {
        file {
                path => "/home/my-app/.pm2/logs/my-app-out.log"
                start_position => "beginning"
                sincedb_path => "/opt/logstash/sincedb-access"
        }
}

filter {
        grok {
                match => { "message" => "%{DATA:timestamp} - info: processRequestMain: my-product: (input|output) sessionid = \{%{GREEDYDATA:session_id}\} (reqXml|resXml) = %{GREEDYDATA:content_xml}" 
       }

        if "_grokparsefailure" in [tags] {
                drop { }
        }  

        xml {
                source => "content_xml"
                target => "content"
        }

        split {
                field => "content[app]"
        }

        mutate {
                add_field => {
                        "env" => "${MYAPP_ENV}"
                        "instance_id" => "${MYAPP_INSTANCE_ID}"                
                }
        }
}

output {
    amazon_es {
        hosts => [ "search-myapp-dev-af6m6cidasgqsnmskxup2fh57y.us-east-1.es.amazonaws.com" ]
        region => "us-east-1"
        index => "logstash-myapp-%{+YYYY.MM.dd}"
    }
}


Elasticsearch


  • Distributed, RESTful search and analytics engine
  • Built on Apache Lucene
  • Used for storing (it is basically a Database), searching, and analyzing large volumes of data (e.g. logs) quickly and in near real-time
  • Scalable, fast, and able to handle complex queries
  • Licensed, not open source
    • OpenSearch is open-sourced alternative (supported by AWS)
    • FluentD is another open-source data collection alternative
  • Data in the form of JSON documents is sent to Elasticsearch using:
    • API
    • Ingestion tools
      • Logstash - e.g. it's pushing logs to ElasticSearch
      • Amazon Kinesis Data Firehose
  • The original document is automatically stored and a searchable reference is added to the document in the cluster’s index
  • Elasticsearch REST-based API is used to manipulate with documents:
    • send
    • search
    • retrieve 
  • Uses schema-free JSON documents
  • Distributed system
    • Enables it to process large volumes of data in parallel, quickly finding the best matches for your queries
  • Operations such as reading or writing data usually take less than a second to complete => Elasticsearch can be used for near real-time use cases such as application monitoring and anomaly detection
  • Has support for various languages: Java, Python, PHP, JavaScript, Node.js, Ruby etc...

Kibana


  • Visualisation and reporting tool
  • Used with Elasticsearch to:
    • visualize the data
    • build interactive dashboards

Filebeat


  • https://www.elastic.co/beats/filebeat
  • log shipper
  • both Filebeat and Logstash can be used to send logs from a file-based data source to a supported output destination
  • Filebeat is a lightweight option, ideal for environments with limited resources and basic log parsing needs. Conversely, Logstash is tailored for scenarios that demand advanced log processing
  • both FB and LS can be used in tandem when building a logging pipeline with the ELK Stack because both have a different function


image source: https://www.guru99.com/




References:



Friday 2 February 2024

Installing GraphViz on MacOS

I wanted to test Command: graph | Terraform | HashiCorp Developer by cd-ing to an arbitrary Terraform module and executing:

% terraform graph -type=plan | dot -Tpng >graph.png

But this issued an error:

zsh: command not found: dot

Solution:

% brew install graphviz  
...
==> Installing graphviz
==> Pouring graphviz--9.0.0.arm64_ventura.bottle.tar.gz
🍺  /opt/homebrew/Cellar/graphviz/9.0.0: 287 files, 7.1MB
==> Running `brew cleanup graphviz`...
Disable this behaviour by setting HOMEBREW_NO_INSTALL_CLEANUP.
Hide these hints with HOMEBREW_NO_ENV_HINTS (see `man brew`).


To verify installation:

% dot --version 
dot - graphviz version 9.0.0 (20230911.1827)