Tap is a package source (formula repository).
My Public Notepad
Bits and bobs about computers and programming
Tuesday, 8 July 2025
How to install MongoDB Shell (mongosh) on Mac
Monday, 30 June 2025
Introduction to Amazon API Gateway
Amazon API Gateway:
- fully managed service to create, publish, maintain, monitor, and secure APIs at any scale
- APIs act as the "front door" for applications to access data, business logic, or functionality from our backend services
- allows creating:
- RESTful APIs
- optimized for serverless workloads and HTTP backends using HTTP APIs
- they act as triggers for Lambda functions
- HTTP APIs are the best choice for building APIs that only require API proxy functionality
- Use REST APIs if our APIs require in a single solution both:
- API proxy functionality
- API management features
- WebSocket APIs that enable real-time two-way communication applications
- supports:
- containerized workloads
- serverless workloads
- web applications
- handles all the tasks involved in accepting and processing up to hundreds of thousands of concurrent API calls, including:
- traffic management
- CORS support
- authorization and access control
- throttling
- monitoring
- API version management
- has no minimum fees or startup costs. We pay for the API calls we receive and the amount of data transferred out and, with the API Gateway tiered pricing model, we can reduce our cost as our API usage scales
RESTful APIs
REST API endpoints (apiGateway):
- Older, feature-rich, supports API keys, usage plans, request/response validation, custom authorizers, and more.
- More configuration options, but higher latency and cost.
- Defined under the provider.apiGateway section and function events: http.
- Newer, simpler, faster, and cheaper.
- Supports JWT/Lambda authorizers, CORS, and OIDC, but lacks some advanced REST API features.
- Defined under provider.httpApi and function events: httpApi.
Friday, 27 June 2025
GitHub Workflows and AWS
IAM User Authentication
OpenID Connect (OIDC) Authentication
- GitHub OIDC Provider: GitHub acts as an OIDC provider, issuing signed JWTs (JSON Web Tokens) to workflows that request them.
- configure-aws-credentials Action: This action, when invoked in a GitHub Actions workflow, receives the JWT from the OIDC provider.
- AWS STS Request: The action then uses the JWT to request temporary security credentials from AWS Security Token Service (STS).
- Credential Injection: AWS STS returns temporary credentials (access key ID, secret access key, and session token) which the action injects as environment variables into the workflow's execution environment.
- AWS SDKs and CLI: AWS SDKs and the AWS CLI automatically detect and use these environment variables for authenticating with AWS services.
- Enhanced Security: Eliminates the need to store long-lived AWS access keys, reducing the risk of compromise.
- Simplified Credential Management: Automatic retrieval and injection of temporary credentials, simplifying workflow setup and maintenance.
- Improved Auditing: Provides better traceability of actions performed within AWS, as the identity is linked to the GitHub user or organization.
- Configure an OpenID Connect provider in AWS: We need to establish an OIDC trust relationship between GitHub and our AWS account.
- Create an IAM role in AWS: Define the permissions for the role that the configure-aws-credentials action will assume.
- Set up the GitHub workflow: Configure the configure-aws-credentials action with the appropriate parameters, such as the AWS region and the IAM role to assume.
- AWS_ACCESS_KEY_ID: This environment variable stores the access key ID of the temporary credentials.
- AWS_SECRET_ACCESS_KEY: This environment variable stores the secret access key of the temporary credentials.
- AWS_SESSION_TOKEN: This environment variable stores the session token associated with the temporary credentials, which is required for operations with AWS Security Token Service (STS).
Friday, 13 June 2025
Introduction to Serverless Framework
- sls deploy is idempotent for infrastructure: Re-running it with no changes is safe and does not cause duplicate resources or unintended side effects at the CloudFormation level.
- Application-level idempotency is our responsibility: Ensure our Lambda functions and integrations handle repeated events if that is a requirement for our use case
Serverless Yaml Configuration File
- database e.g. DynamoDB
- Rest API e.g. which handles the submitted web form and stores data in DynamoDB
- front-end website which e.g. stores React app website in s3 bucket
- service: - name of the service
- useDotenv: boolean (true|false)
- configValidationMode: error
- frameworkVersion: e.g. "3"
- provider -
- name - provider name e.g. aws
- runtime - e.g. nodejs18.x
- region e.g. us-east-1
- memorySize - how much memory will have the machine on which Lambda will be running e.g. 1024 (MB). It is good to check the actual memory usage and adjust the required memory size - downsizing can lower the costs!
- timeout: (number) e.g. 60 [seconds] - the maximum amount of time, in seconds, that a serverless function (such as an AWS Lambda function) is allowed to run before it is forcibly terminated by the AWS platform. This setting ensures that our function does not run indefinitely. If the function execution exceeds 60 seconds, the serverless platform will automatically stop it and return a timeout error. The timeout property is commonly used to control resource usage and prevent runaway executions. It is especially important for functions that interact with external services or perform long-running tasks. If not specified, most serverless platforms (like AWS Lambda) use a default timeout (for AWS Lambda, the default is 3 seconds, and the maximum is 900 seconds or 15 minutes).
- httpApi:
- id:
- apiGateway:
- minimumCompressionSize: 1024
- shouldStartNameWithService: true
- restApiId: ""
- restApiRootResourceId: ""
- stage: - name of the environment e.g. production;
- iamManagedPolicies: a list of ARNs of policies that will be associated to the Lambda's computing instance e.g. policy which allows access to S3 buckets etc...
- lambdaHashingVersion
- environment: dictionary of environment variable names and values
- vpc
- securityGroupIds: list
- subnetIds - typically a list of private subnets with NAT gateway.
- functions: a dictionary which defines the AWS Lambda functions that are deployed as part of this Serverless service. This is where we define the AWS Lambda functions that our Serverless service will deploy.
- <function_name>: string, a logical name of the function (e.g., my-function). This name is used to reference the function within the Serverless Framework and in deployment outputs. A name of the provisioned Lambda function is in format: <service_name>-<stage>-<function_name>. Each function entry under functions specifies:
- handler - tells Serverless which file and exported function to execute as the Lambda entry point (e.g., src/fn/lambda.handler which points to handler export in the src/fn/lambda module). Specifies the entry point for the Lambda function. When the function is invoked, AWS Lambda will execute this handler.
- events - (optional, array) a list of events that trigger this function
- Some triggers:
- schedule, scheduled events: for periodic invocation (cron-like jobs)
- sns: for invocation via an AWS SNS topic
- HTTP endpoints,
- S3 events
- messages from a Kafka topic in an MSK cluster (msk)
- If the array is empty, that means that the function currently has no event sources configured and will not be triggered automatically by any AWS event.
- plugins: a list of serverless plugins e.g.
- serverless-webpack
- serverless-esbuild
- serverless-offline [https://www.serverless.com/plugins/serverless-offline, https://github.com/dherault/serverless-offline]
- emulates AWS Lambda and API Gateway. It starts an HTTP server that handles the request's lifecycle like APIG does and invokes the handlers.
- sls offline --help
- serverless-plugin-log-subscription
- custom: - section for serverless plugins settings e.g. for esbuild, logSubscription, webpack etc...
- example: serverless-plugin-log-subscription plugin has the settings:
- example: serverless-domain-manager - used to define stage-specific domains.
- logSubscription: {
enabled: true,
destinationArn: process.env.SUBSCRIPTION_STREAM,
roleArn: process.env.SUBSCRIPTION_ROLE,
}
domains: {production: {url: "app.api.example.com",certificateArn: "arn:aws:acm:us-east-2:123456789012:certificate/a8f8f8e2-95fe-4934-abf2-19dc08138f1f",},staging: {url: "app.staging.example.com",certificateArn: "arn:aws:acm:us-east-2:123456789012:certificate/a32e9708-7aeb-495b-87b1-8532a2592eeb",},dev: {
url: "",
certificateArn: ""
},},
Thursday, 12 June 2025
Useful Kibana DevTools Queries
- "size": 0: No documents returned, just aggregation results.
- "terms": Collects unique values.
- "channel_type.keyword": Use .keyword to aggregate on the raw value (not analyzed text).
- "size": 10000: Max number of buckets (unique values) to return. Adjust as needed.
Friday, 30 May 2025
Introduction to Elastic Agents
Elastic Agents are unified, lightweight software components developed by Elastic to collect, ship, and (optionally) protect data—including logs, metrics, traces, and security events—from your infrastructure to the Elastic Stack (Elasticsearch, Kibana, etc.)
Elastic Agents are not strictly required components in every Elastic Stack deployment, but they play a crucial role in certain scenarios. Here's an explanation based on use cases:
Key Functions of Elastic Agents (When Elastic Agents Are Required?)
Unified Data Collection:
- They provide a single, centralized solution to collect various types of observability and security data from hosts, containers, and Kubernetes clusters (logs, metrics, traces, and security data)
- They replace individual Beats (e.g., Filebeat, Metricbeat) for streamlined data ingestion.
- Kubernetes Monitoring:
- When deployed on Kubernetes (often as a DaemonSet), Elastic Agent runs on every node, collecting:
- System metrics (CPU, memory, disk, etc.)
- Kubernetes resource metrics (pods, nodes, deployments)
- Logs from nodes and containers
- Security posture and events
Fleet Management:
- Elastic Agents can be centrally managed using Elastic Fleet, allowing you to configure, update, and monitor all agents and their integrations from a single Kibana interface
- Elastic Agents are required when using Fleet, the centralized management interface in Kibana.
- Fleet allows you to:
- Manage agent configurations from a single UI.
- Deploy updates and policies at scale.
- Monitor agent health and performance.
- Endpoint Security:
- Elastic Agents are necessary for using endpoint Security features, like malware detection, endpoint protection, and threat monitoring, host intrusion detection, and Kubernetes Security Posture Management (KSPM)
When Elastic Agents Are Not Required:
Traditional Beats Usage:
- If you are already using specific Beats (e.g., Filebeat, Metricbeat, Heartbeat) for data collection and do not need unified management, Elastic Agents are optional.
- Beats can ship data directly to Elasticsearch or Logstash without requiring Fleet or Elastic Agents.
Direct Data Ingestion:
- If you are ingesting data directly into Elasticsearch via APIs, custom applications, or third-party tools, Elastic Agents are not needed.
Standalone Elastic Stack:
- For use cases focused purely on search, analytics, or visualization where data is ingested manually or through custom integrations, Elastic Agents are unnecessary.
Key Considerations:
- Unified Management: Elastic Agents with Fleet simplify large-scale deployments and are recommended for environments with many data sources.
- Compatibility: Elastic is gradually consolidating data collection around Elastic Agents, so they are the future-proof choice for managing observability and security data.
- Flexibility: You can still mix and match Elastic Agents and Beats, depending on your requirements.
How Elastic Agents Work in Kubernetes
Deployment
Leader Election
Data Flow
In summary, Elastic Agents are not mandatory for all Elastic Stack setups, but they are highly beneficial for unified data collection, centralized management, and security monitoring.
How to deploy Elastic stack via Elastic Cloud on Kubernetes (ECK)
- Kubernetes operator
- Automates the deployment, provisioning, management, and orchestration of Elastic applications on Kubernetes, including:
- Elasticsearch
- Kibana
- APM Server
- Beats
- Elastic Agent
- Elastic Maps Server
- Logstash
- eck-operator-crds
- eck-operator
- eck-elasticsearch
- eck-kibana
- eck-fleet-server
- eck-agent
- eck-apm-server
Prerequisites
- AWS EKS cluster with addons
- User with associated arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy
eck-operator-crds
- Elasticsearch
- elasticsearches.elasticsearch.k8s.elastic.co
- Manages Elasticsearch clusters
- Kibana
- kibanas.kibana.k8s.elastic.co
- Manages Kibana instances
- ApmServer
- apmservers.apm.k8s.elastic.co
- Manages APM Servers
- Beat
- beats.beat.k8s.elastic.co
- Manages Beats (Filebeat, Metricbeat, etc.) agents
- EnterpriseSearch
- enterprisesearches.enterprisesearch.k8s.elastic.co
- Manages Enterprise Search instances
eck-operator
- Elasticsearch
- Kibana
- APM Server
- Enterprise Search
- Beats
- Elastic Agent
- Elastic Maps Server
Monitoring the operator
eck-elasticsearch
- Creating or deleting indexes.
- Tracking which nodes are part of the cluster.
- Allocating shards to different nodes.
- Updating and propagating the cluster state across the cluster.
- By default, its name follows this format: <elasticsearch_cluster_name>-es-http
- Its roles are:
- Primary access point: It acts as the main endpoint for clients (such as applications, users, or other services) to interact with the Elasticsearch cluster using the REST API.
- Handles authentication and TLS: The service is secured by default with TLS and basic authentication, managed by the ECK operator.
- Traffic distribution: It routes incoming HTTP (REST API) traffic to all Elasticsearch nodes in our cluster, unless we create custom services for more granular routing (for example, to target only data or ingest nodes).
- Its type is ClusterIP, meaning it is accessible only within the Kubernetes cluster (from other pods or nodes that are part of the same cluster) unless otherwise configured
- The service listens on port 9200 (the default Elasticsearch HTTP port) and load-balances requests to the Elasticsearch pods
- The CA certificate (to trust the service’s TLS certificate)
- The elastic user password (stored in a Kubernetes secret)
eck-kibana
- snapshot repositories. They can be S3-backed.
- Kibana users. Each user is assigned a set of predefined Kibana roles.
- Index lifecycle rules
- Component templates (which reference those lifecycle rules)
eck-fleet-server
- Fleet Integrations, like:
- system
- fleet_server
- elastic_agent
- kibana
- elasticsearch
- kubernetes
- apm
- aws
- Fleet Agent Policies
- Fleet Server can have its own policy
- e.g. sys_monitoring can be disabled while monitor_logs and monitor_metrics enabled
- Fleet Agents have their own policy
- e.g. all monitoring types enabled
- Fleet Integration Policies
- They define:
- which agent policy should be associated with which integration
- inputs
eck-agent
- Elastic Agent Custom Resource
- The chart creates one or more Elastic Agent custom resources (Agent), which are Kubernetes objects managed by the ECK operator.
- These resources define how Elastic Agents are deployed, configured, and connected to your Elasticsearch and Kibana instances.
- Associated Kubernetes Resources
- The Agent custom resource triggers the ECK operator to create the necessary Kubernetes resources, such as:
- Pods/DaemonSets: Runs the Elastic Agent containers on your nodes. Elastic Agents are typically deployed as pods (usually via a DaemonSet or Deployment) in a namespace such as kube-system or elastic-agent. These pods execute the Elastic Agent binary, which collects data and communicates with the Fleet Server.
- ConfigMaps/Secrets: Stores configuration and credentials for the agents.
- ServiceAccounts, Roles, RoleBindings: Manages permissions for the agents to interact with the Kubernetes API if needed.
- Fleet Integration (Optional)
- The chart can configure Elastic Agents to enroll with Elastic Fleet, allowing for centralized management of agent policies and integrations.
Typical Use Cases
- Observability: Collect logs, metrics, and traces from Kubernetes workloads and nodes.
- Security: Use Elastic Agent for security monitoring and data shipping.
- Fleet Management: Centrally manage agent configurations using Elastic Fleet.
Example: What we might see deployed
How to Verify
- common.k8s.elastic.co/type=agent <-- this shows that pod is running Agent (of Elastic CRD type)
- agent.k8s.elastic.co/name=<helm_installation_name>-eck-agent
- agent.k8s.elastic.co/version=version set int version attribute in chart values
Agents in Kibana UI (Fleet Layer)
- Online: Agent is actively communicating with Fleet Server.
- Offline: Agent has not checked in with Fleet Server recently (default: 2 minutes).
- Fleet (in Kibana) manages policy revisions automatically. When you make any change to an agent policy through the Fleet UI or API, Fleet increments the revision number and distributes the updated policy to all agents enrolled in that policy.
- Users do not manually set or manage the revision number; it is handled by the Fleet management system.
- Change tracking: The revision number helps track when and how a policy has changed. Each agent reports which policy revision it is using, making it easy to see if agents are up to date.
- Troubleshooting: If agents are not behaving as expected, the revision number can help correlate issues with recent policy changes.
- Auditability: While the revision number itself does not provide a full change history, it signals that a change has occurred. (Note: The Fleet UI does not currently provide a detailed revision history with user attribution)