Thursday, 14 May 2026

Introduction to Checkly



How Checkly works

Checkly is a SaaS synthetic monitoring platform — you define "checks" (HTTP requests or browser scripts), Checkly runs them on a schedule from probe locations around the world (or on-demand from CI), records latency/assertions/screenshots, and alerts you when they fail or get slow.
  
  Two main check types:

  - API checks — a single HTTP request with assertions on status, headers, body, response time.
  - Browser checks — a Playwright script run in a real headless Chromium against your deployed app.

There's also multi-step API checks (chain requests, e.g. login → use token → logout) and heartbeat checks (your job pings Checkly; alert if it stops).

Hearbeat vs Ping 

Heartbeats and pings are both vital network failure-detection mechanisms, but they differ in purpose: Heartbeats are proactive, periodic "I am alive" messages sent by an application to signal it is healthy, while Pings are reactive requests to check if a server is reachable. Heartbeats detect application crashes, while pings detect network downtime.

Checks are typically authored as code (Checkly CLI, TypeScript) and checkly deploy'd to the cloud. You can tag them (tags: ["auth"]), parametrise them with env vars like ENVIRONMENT_URL, and trigger them on-demand from CI — which is exactly what this PR does with npx checkly trigger --tags=auth.
  
  Runtime model:
  - Scheduled: every N minutes from chosen regions (e.g. us-east-2, eu-west-1) — catches regressions/outages between deploys.
  - Triggered from CI: post-deploy smoke test, results gate (or just annotate) the deploy.
  - Alerts: Slack/PagerDuty/email on failure, with retry/degraded thresholds to avoid flap.

  ---

  What it would check for this auth API
  
  Given the auth API's surface (login, OAuth, JWT issuance, admin endpoints), realistic auth-tagged checks:

  1. Health endpoint — basic liveness

  new ApiCheck("auth-health", {
    name: "Auth API – health",
    tags: ["auth"],
    frequency: 1, // minute
    locations: ["us-east-2", "eu-west-1"],
    request: {
      url: `${process.env.ENVIRONMENT_URL}/health`,
      method: "GET",
      assertions: [
        AssertionBuilder.statusCode().equals(200),
        AssertionBuilder.responseTime().lessThan(500),
        AssertionBuilder.jsonBody("$.status").equals("ok"),
      ],
    },
  });

  2. Login flow — happy path, returns a JWT

  new ApiCheck("auth-login", {
    name: "Auth API – login returns JWT",
    tags: ["auth"],
    request: {
      url: `${process.env.ENVIRONMENT_URL}/auth/login`,
      method: "POST",
      headers: [{ key: "Content-Type", value: "application/json" }],
      body: JSON.stringify({
        email: process.env.SYNTHETIC_USER_EMAIL,
        password: process.env.SYNTHETIC_USER_PASSWORD,
      }),
      assertions: [
        AssertionBuilder.statusCode().equals(200),
        AssertionBuilder.responseTime().lessThan(1500),
        AssertionBuilder.jsonBody("$.token").isNotNull(),
        // structural check on JWT shape
        AssertionBuilder.jsonBody("$.token").matches("^eyJ[A-Za-z0-9_-]+\\.[A-Za-z0-9_-]+\\.[A-Za-z0-9_-]+$"),
      ],
    },
  });
  
  3. Login — wrong password returns 401 (negative path)

  Catches the "accidentally accepts anything" class of regression.

  new ApiCheck("auth-login-bad-pw", {
    name: "Auth API – wrong password = 401",
    tags: ["auth"],
    request: {
      url: `${process.env.ENVIRONMENT_URL}/auth/login`,
      method: "POST",
      headers: [{ key: "Content-Type", value: "application/json" }],
      body: JSON.stringify({ email: process.env.SYNTHETIC_USER_EMAIL, password: "wrong" }),
      assertions: [AssertionBuilder.statusCode().equals(401)],
    },
  });
  
  4. Multi-step — login then call protected endpoint

  This is the most useful kind for an auth API, because it proves the token actually works.

  new MultiStepCheck("auth-token-roundtrip", {
    name: "Auth API – token works against /me",
    tags: ["auth"],
    code: { entrypoint: path.join(__dirname, "token-roundtrip.spec.ts") },
  });
  // token-roundtrip.spec.ts
  import { test, expect } from "@playwright/test";
  test("login then /me", async ({ request }) => {
    const login = await request.post(`${process.env.ENVIRONMENT_URL}/auth/login`, {
      data: { email: process.env.SYNTHETIC_USER_EMAIL, password: process.env.SYNTHETIC_USER_PASSWORD },
    });
    expect(login.ok()).toBeTruthy();
    const { token } = await login.json();
    
    const me = await request.get(`${process.env.ENVIRONMENT_URL}/me`, {
      headers: { Authorization: `Bearer ${token}` },
    });
    expect(me.status()).toBe(200);
    const body = await me.json();
    expect(body.email).toBe(process.env.SYNTHETIC_USER_EMAIL);
  });
  
  5. TLS & cert expiry

  A pure config check — useful because cert rotation is a classic outage cause.

  new ApiCheck("auth-tls", {
    name: "Auth API – TLS cert valid > 14d",
    tags: ["auth"],
    request: {
      url: `${process.env.ENVIRONMENT_URL}/health`,
      method: "GET",
      assertions: [AssertionBuilder.statusCode().equals(200)],
    },
    // Checkly surfaces cert expiry on the run; you set a threshold per check
  });
  
  6. Browser check — full login UX

    expect(login.ok()).toBeTruthy();
    const { token } = await login.json();

    const { token } = await login.json();

    const me = await request.get(`${process.env.ENVIRONMENT_URL}/me`, {
      headers: { Authorization: `Bearer ${token}` },
    });
    expect(me.status()).toBe(200);
    const body = await me.json();
    expect(body.email).toBe(process.env.SYNTHETIC_USER_EMAIL);
  });

  5. TLS & cert expiry

  A pure config check — useful because cert rotation is a classic outage cause.

  new ApiCheck("auth-tls", {
    name: "Auth API – TLS cert valid > 14d",
    tags: ["auth"],
    request: {
      url: `${process.env.ENVIRONMENT_URL}/health`,
      method: "GET",
      assertions: [AssertionBuilder.statusCode().equals(200)],
    },
    // Checkly surfaces cert expiry on the run; you set a threshold per check

  5. TLS & cert expiry

  A pure config check — useful because cert rotation is a classic outage cause.

  new ApiCheck("auth-tls", {
    name: "Auth API – TLS cert valid > 14d",
    tags: ["auth"],
    request: {
      url: `${process.env.ENVIRONMENT_URL}/health`,
      method: "GET",
      assertions: [AssertionBuilder.statusCode().equals(200)],
    },
    // Checkly surfaces cert expiry on the run; you set a threshold per check
  });

  new ApiCheck("auth-tls", {
    name: "Auth API – TLS cert valid > 14d",
    tags: ["auth"],
    request: {
      url: `${process.env.ENVIRONMENT_URL}/health`,
      method: "GET",
      assertions: [AssertionBuilder.statusCode().equals(200)],
    },
    // Checkly surfaces cert expiry on the run; you set a threshold per check
  });

  6. Browser check — full login UX

  Runs against the front-end but exercises the auth API end-to-end including redirects, cookies, CSRF.

  new BrowserCheck("auth-ui-login", {
    name: "Login UI works",
    tags: ["auth"],
    code: { entrypoint: path.join(__dirname, "login.spec.ts") },
  });
  import { test, expect } from "@playwright/test";
  test("user can sign in", async ({ page }) => {
    await page.goto(process.env.ENVIRONMENT_URL!);
    await page.getByLabel("Email").fill(process.env.SYNTHETIC_USER_EMAIL!);
    await page.getByLabel("Password").fill(process.env.SYNTHETIC_USER_PASSWORD!);
    await page.getByRole("button", { name: "Sign in" }).click();
    await expect(page.getByText("Dashboard")).toBeVisible({ timeout: 10_000 });
  });

  7. OAuth callback reachability

  Doesn't fully exercise the Google/Microsoft flow (those need real consent), but checks the callback
  endpoint responds correctly to a missing-code request — confirms route + handler are wired.

  new ApiCheck("auth-oauth-google-callback-shape", {
    name: "Auth API – Google OAuth callback exists",
    tags: ["auth"],
    request: {
      url: `${process.env.ENVIRONMENT_URL}/auth/google/callback`,
      method: "GET",
      assertions: [
        // 400 for missing `code`, not 404/500 — proves handler is mounted
        AssertionBuilder.statusCode().equals(400),
      ],
    },
  });

Intro to QA with Headless Browsers

Headless browsers are used in QA to execute automated browser tests faster and more efficiently by eliminating the graphical user interface (GUI). Because they don't render visuals, they consume fewer resources, enabling rapid, parallel testing in CI/CD pipelines, making them ideal for high-volume functional and regression testing.

Key Reasons for Using Headless Browsers in QA:
  • Faster Execution: Without the need to render CSS, images, or layout, tests run significantly faster.
  • CI/CD Integration: They are ideal for server-side environments where a GUI is unavailable, allowing automated tests to run after every code commit.
  • Lower Resource Usage: They consume significantly less RAM and CPU, allowing for higher parallelization (running many tests simultaneously) without overloading hardware.
  • Automated Functional Testing: They can accurately simulate user actions such as clicking buttons, submitting forms, and navigating pages.
  • Regression Testing: Due to speed and efficiency, they are perfect for running large suites of regression tests to ensure new changes haven't broken existing functionality.
Common tools for headless testing include headless Chrome, Firefox, Puppeteer, and Playwright

Headless browsers parse, compile, and execute the exact same underlying code as standard browsers, but they skip the final step of painting pixels to a physical screen.

What Headless Browsers Still Do
  • Construct the DOM: They parse HTML into a full Document Object Model tree.
  • Apply Styling: They process CSS and calculate layout, element positions, and visibility.
  • Execute JavaScript: They run a full JS engine (like V8 in Chrome) to handle AJAX, animations, and frontend logic.
  • Manage Network Traffic: They make real HTTP requests, download cookies, and handle API responses.

How QA Verifies Visuals Without a Display
  • Layout Queries: Code checks if elements are present, hidden, or overlapping by querying their coordinates.
  • Computed Styles: Scripts verify specific CSS properties, like checking if a button color is exactly rgb(0, 0, 255).
  • Virtual Screenshots: The browser renders the page into an in-memory buffer, allowing QA tools to save PNGs or perform pixel-by-pixel visual regression comparisons.

To help tailor using headless browser to our workflow, we need to know:
  • Which testing framework we are using (e.g., Playwright, Selenium, Cypress)?
  • Are we trying to catch functional bugs or visual layout glitches?
  • Do our tests run on a local machine or a CI/CD server (e.g., GitHub Actions, Jenkins)?

---

Wednesday, 29 April 2026

Introduction to Amazon Simple Notification Service (SNS)



Amazon Simple Notification Service (SNS) is a fully managed messaging service that enables you to decouple microservices, distributed systems, and serverless applications. Here's how SNS works:

Key Concepts


Topics:

A topic is a logical access point and communication channel. Publishers send messages to a topic, and subscribers receive these messages by subscribing to the topic.

Publishers:

Publishers are entities that send messages to an SNS topic. They could be applications, services, or even other AWS services like Lambda or CloudWatch.

Subscribers:

Subscribers are endpoints that receive messages from an SNS topic. These can include Amazon SQS queues, AWS Lambda functions, HTTP/S endpoints, email addresses, and SMS numbers.

Messages:

Messages are the payload sent by publishers to SNS topics. They can include a variety of data formats, typically JSON.



How It Works


Creating a Topic:

First, you create a topic using the AWS Management Console, AWS CLI, or AWS SDKs. This topic acts as a communication channel.

Subscribing to a Topic:

You then subscribe one or more endpoints to the topic. These endpoints can be other AWS services or external services capable of receiving notifications.

When subscribing, you specify the protocol (such as HTTP, SQS, Lambda, etc.) and the endpoint (like the URL or ARN of the SQS queue).

Publishing a Message:

Publishers send messages to the SNS topic using the Publish API. The message can include a subject, a message body, and optional attributes.

SNS stores multiple copies of the message for redundancy and high availability.

Message Delivery:

SNS distributes the message to all subscribed endpoints.

Each endpoint processes the message according to its protocol. For example, an HTTP endpoint receives a POST request with the message content, and an SQS queue receives the message as a new queue entry.



Use Cases


Fan-out Scenarios:

When a message published to an SNS topic needs to be sent to multiple endpoints, SNS acts as a fan-out service. For example, updating various microservices or notifying multiple systems about an event.

Push Notifications:

SNS can be used to send push notifications to mobile devices through services like Amazon Device Messaging (ADM), Apple Push Notification Service (APNS), and Google Cloud Messaging (GCM).

Application Alerts:

Sending notifications or alerts to administrators or users via SMS, email, or other endpoints.

Event-Driven Computing:

Integrating with AWS Lambda to trigger functions in response to incoming messages, enabling event-driven processing.



Benefits


Scalability:

SNS is designed to handle high-throughput and large volumes of messages, making it suitable for large-scale applications.


Flexibility:

It supports multiple protocols, allowing for versatile messaging architectures.


Reliability:

With built-in redundancy and fault tolerance, SNS ensures reliable message delivery.


Ease of Use:

The service is fully managed, reducing the operational overhead associated with maintaining messaging infrastructure.


Security



Access Control:

Using AWS IAM policies, you can control who can publish or subscribe to topics.


Encryption:

SNS supports server-side encryption (SSE) for messages at rest and SSL/TLS for messages in transit.

Compliance:

SNS is compliant with various industry standards and regulations, ensuring it meets security and compliance requirements.

Amazon SNS is a powerful and flexible tool for building loosely coupled, scalable, and robust messaging solutions in the cloud.



How does Amazon SNS differ from Amazon SQS?



Amazon SNS (Simple Notification Service) and Amazon SQS (Simple Queue Service) are both messaging services provided by AWS, but they serve different purposes and have distinct features. Here's a comparison highlighting their differences:

Purpose and Use Cases


Amazon SNS:
  • Publish/Subscribe Messaging: 
    • SNS is designed for sending notifications or messages to multiple subscribers.
    • It's ideal for broadcasting messages to multiple endpoints, such as sending alerts, notifications, or updates to different systems or users.
  • Use Cases: 
    • Real-time notifications
    • push notifications
    • fan-out scenarios (where a message needs to be sent to multiple recipients simultaneously)
    • application alerts
    • event-driven architectures

Amazon SQS:
  • Message Queuing: 
    • SQS is designed for decoupling and scaling distributed systems. 
    • It allows you to send, store, and receive messages between software components at any volume, without losing messages or requiring other services to be available.
  • Use Cases:
    • Task queues
    • asynchronous processing
    • decoupling microservices
    • job dispatching
    • buffering messages between producer and consumer systems



Messaging Patterns



Amazon SNS:

  • Push-Based: SNS pushes messages to subscribers. Subscribers can be other AWS services (like Lambda, SQS), HTTP/S endpoints, email addresses, SMS numbers, and mobile push notifications.
  • Fan-Out: One message can be sent to multiple subscribers.

Amazon SQS:

  • Pull-Based: Consumers pull messages from the queue. A consumer explicitly retrieves messages from the queue.
  • Point-to-Point: Each message is delivered to and processed by one consumer.


Message Handling


Amazon SNS:

  • Real-Time Delivery: Messages are delivered immediately to all subscribers.
  • No Message Persistence: Messages are not stored after delivery; if a subscriber is unavailable, the message is lost unless it's sent to an SQS queue or some other durable store.

Amazon SQS:

  • Message Persistence: Messages are stored in the queue until they are processed and deleted by a consumer, or until they expire.
  • Delivery Guarantees: Ensures at least once delivery. With FIFO queues, SQS provides exactly-once processing and message ordering.

Scalability and Performance


Amazon SNS:

  • Scalable: Designed to handle massive numbers of messages and deliver them to large numbers of subscribers.
  • Latency: Typically has very low latency for message delivery.

Amazon SQS:

  • Scalable: Automatically scales to handle large volumes of messages. Suitable for high-throughput applications.
  • Latency: Slightly higher latency compared to SNS due to the nature of pull-based consumption.

Features and Capabilities


Amazon SNS:

  • Multiple Protocols: Supports multiple delivery protocols including HTTP/S, email, SMS, SQS, Lambda, and mobile push notifications.
  • Filtering: Allows message filtering, enabling subscribers to receive only the messages that match their filter policies.

Amazon SQS:

  • Visibility Timeout: Temporarily hides a message from other consumers while it is being processed.
  • Dead-Letter Queues (DLQ): Allows you to handle messages that can't be processed successfully.
  • FIFO Queues: Ensures the order of messages and exactly-once processing.
  • Delay Queues: Postpones the delivery of new messages to consumers for a specified amount of time.


Pricing


Amazon SNS:

  • Pricing Model: Based on the number of requests (publishes, deliveries, and notifications) and data transfer.
  • Cost Efficiency: More cost-effective for scenarios requiring a high number of subscribers and real-time notifications.

Amazon SQS:

  • Pricing Model: Based on the number of requests (send, receive, delete) and data transfer.
  • Cost Efficiency: More cost-effective for decoupling microservices and scenarios requiring message persistence and complex message handling.


Integration and Interoperability


Amazon SNS:

  • Integration: Easily integrates with a wide range of AWS services (e.g., Lambda, SQS, HTTP/S endpoints, etc.).
  • Interoperability: Often used in conjunction with SQS for fan-out scenarios where messages need to be processed asynchronously and stored reliably.

Amazon SQS:

  • Integration: Commonly used to decouple systems and provide reliable message delivery. Often used with other AWS services like Lambda, ECS, and EC2.
  • Interoperability: Can be subscribed to SNS topics to receive messages that need persistent storage or further processing.

Summary


In summary, Amazon SNS is a pub/sub messaging service optimized for real-time notifications and broadcasting messages to multiple subscribers, while Amazon SQS is a message queuing service designed for decoupling distributed systems and ensuring reliable message delivery through persistence and processing guarantees. They are often used together to build scalable, resilient, and flexible messaging architectures in AWS.


Push Notification Service - Amazon Simple Notification Service - AWS

Tuesday, 21 April 2026

Provisioning AWS EKS Cluster with terraform-aws-modules/eks/aws





In this article we want to explore and breakdown its key components and their purposes.

We'd typically use this module like here:

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "21.15.1"
  ...
}


Let's explore this module's attributes.

1. Cluster Configuration


name,  version

Sets the name and Kubernetes version for the EKS cluster. Use local and variable values for flexibility.

endpoint_public_access

Set public (Internet) access to the Kubernetes API endpoint (via kubectl). Disable it for enhanced security. 

endpoint_private_access

Set private access to the API endpoint, whether only resources within the VPC can access it. If enabled, it is only reachable from within the VPC (Virtual Private Cloud) where your EKS cluster is deployed. There are few ways to access it:

How to Access the Kubernetes API from VPC

1. Use a Bastion Host or EC2 Instance in the VPC

Launch an EC2 instance (bastion host or jump box) in a subnet within the same VPC as your EKS cluster.
SSH into this instance, and from there, use kubectl to access the cluster.
Alternatively, use SSH port forwarding or a VPN to proxy kubectl commands from your local machine through the bastion.

2. Use AWS Systems Manager (SSM) Session Manager

If your EC2 instances have the SSM agent and the necessary IAM permissions, you can use AWS SSM Session Manager to start a shell session on an instance in the VPC, then run kubectl from there.

3. Use a VPN Connection

Set up a VPN (such as AWS Client VPN or OpenVPN, or Site-to-site VPN for office LAN) that connects your local network to the VPC. Once connected, your local machine will be able to reach the private endpoint.

4. Use AWS PrivateLink (Interface VPC Endpoints)

For advanced scenarios, you can use AWS PrivateLink to expose the Kubernetes API endpoint privately to other VPCs or on-premises networks.


enable_cluster_creator_admin_permissions


If enabled, grants admin permissions to the user who creates the cluster.


2. Logging and Add-ons


enabled_log_types

Enables logging for various Kubernetes components (API, audit, authenticator, controllerManager, scheduler) for monitoring and troubleshooting.

Example:

  enabled_log_types = [
    "api",
    "audit",
    "authenticator",
    "controllerManager",
    "scheduler"
  ]

addons

A dictionary-type attribute which installs and configures essential Kubernetes add-ons. Dictionary keys are addon names like:
  • coredns
  • kube-proxy
  • aws-ebs-csi-driver
  • vpc-cni

Dictionary values are objects which attributes are:
  • most_recent - to set using the latest version (set it to false for version pinning)
  • version - addon version (use it for version pinning)
  • before_compute - set it to true if addon should be installed and set before nodes (compute layer)
  • service_account_role_arn - to configure addon with IAM roles for service accounts, enabling secure integration with AWS services.

Example:

addons = {
    ...
    vpc-cni = {
      most_recent              = false
      version                  = "v1.21.1-eksbuild.7"
      before_compute           = true
      service_account_role_arn = module.k8s_default_vpc_cni_irsa.iam_role_arn
    }
    ...
}

VPC CNI (Container networking interface) is responsible for allocating IP addresses to the Kubernetes nodes and provides networking to pods. The plugin manage network interfaces (ENIs) on the nodes and uses it to assign IP addresses to pods.



3. Networking

We need to integrate the EKS cluster with existing VPC and subnets:

vpc_id 

VPC ID

subnet_ids

Subnets in which nodes (EC2 instances) will be created.
Where your worker nodes (EC2 instances) run.

control_plane_subnet_ids

Where the EKS control plane ENIs (network interfaces) are placed
Defines where the EKS control plane creates its Elastic Network Interfaces (ENIs)

What it controls:
  • The EKS control plane runs in an AWS-managed VPC (you don't see it)
  • To communicate with your worker nodes, it creates ENIs in your VPC
  • These ENIs are placed in the subnets you specify here

Typical configuration:

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  
  name = "my-cluster"
  
  # Control plane ENIs go here
  control_plane_subnet_ids = [
    "subnet-private-1a",
    "subnet-private-1b",
    "subnet-private-1c"
  ]
}

Best practices:
  • Usually private subnets
  • Should span multiple AZs for high availability (AWS requires at least 2)
  • Minimum of 2 subnets, maximum of 16
  • Each subnet needs at least 5 available IP addresses

What these ENIs do:
  • Allow the control plane to communicate with worker nodes
  • Allow worker nodes to communicate with the API server
  • Handle API server endpoint traffic


security_group_additional_rules


Adds custom security group rules for the cluster, such as allowing node-to-node communication and VPN access for kubectl.

node_security_group_additional_rules


Further customizes node security groups, allowing all node-to-node traffic and all outbound traffic.



Understanding EKS Architecture

An EKS cluster has two main components:

┌─────────────────────────────────────────────────────────┐
│                    EKS Cluster                          │
│                                                         │
│  ┌───────────────────────────────────────┐              │
│  │   Control Plane (AWS Managed)         │              │
│  │   - API Server                        │              │
│  │   - etcd                              │              │
│  │   - Scheduler                         │              │
│  │   - Controller Manager                │              │
│  │                                       │              │
│  │   Runs in AWS-managed account         │              │
│  └──────────────┬────────────────────────┘              │
│                 │                                       │
│                 │ ENIs in your VPC                      │
│                 │ (control_plane_subnet_ids)            │
│  ┌──────────────▼────────────────────────┐              │
│  │   Your VPC                            │              │
│  │   ┌─────────────────────────────┐     │              │
│  │   │  Worker Nodes (subnet_ids)  │     │              │
│  │   │  - EC2 instances            │     │              │
│  │   │  - Your pods run here       │     │              │
│  │   └─────────────────────────────┘     │              │
│  └───────────────────────────────────────┘              │
└─────────────────────────────────────────────────────────┘

ENI: elastic network interface. It is a logical networking component in a VPC that represents a virtual network card.



4. Node Group Configuration


node_security_group_tags


Adds a tag for Karpenter (an open-source Kubernetes node autoscaler) discovery.

eks_managed_node_group_defaults


Sets default properties for all managed node groups, including:
  • Attaching the CNI policy for networking.
  • Using a specific SSH key.
  • Associating additional security groups.
  • Defining block device mappings for EBS volumes.
  • Attaching the AmazonSSMManagedInstanceCore policy for SSM access.

eks_managed_node_groups


Defines a default managed node group with:
  • A specific AMI type.
  • Desired, minimum, and maximum node counts.
  • Instance types from a variable.
  • On-demand capacity, EBS optimization, and disk size.
  • Custom labels for node identification and environment.

The gold standard for production environments is explicit pinning. This ensures that our infrastructure only changes when we decide to change the code. In order to pin AMI version used in node groups we need to set two attributes:
  • ami_release_version needs to be set. This prevents nodes from cycling unexpectedly during a routine deployment.
  • use_latest_ami_release_version needs to be set to false (without this, terraform plan will still show that it wants to upgrade AMI version, even if we've set ami_release_version)

Example:

  eks_managed_node_groups = {
    "${local.cluster_name}-v1_33" = {
      ...
      ami_release_version            = "1.33.8-20260224"
      use_latest_ami_release_version = false
      ...


5. Tagging


tags


Applies custom tags to all AWS resources created by the module, supporting cost allocation and resource management.


Summary



Our configuration sets up a secure, private, and production-ready EKS cluster with managed node groups, essential add-ons, robust logging, and fine-grained network and IAM controls. It leverages best practices for security (private endpoints, IAM roles for service accounts), scalability (managed node groups, Karpenter tags), and maintainability (modular, versioned, and tagged infrastructure).


---

Wednesday, 15 April 2026

Core Security Practices in DevSecOps & Software Engineering

 


Integrating security into DevOps and software engineering, often called DevSecOps, is a critical shift from treating security as a final checkpoint to embedding it throughout the entire development lifecycle. 

Here are the best security practices, with a specific focus on secrets management and key rotation.

Core Security Practices in DevSecOps & Software Engineering


1. Shift Left


This is the foundational principle of DevSecOps. It means introducing security testing and considerations as early as possible in the software development life cycle (SDLC).
  • Why: It is significantly cheaper and faster to fix a security flaw during the design or coding phase than it is after deployment.
  • Action: Conduct threat modeling during design, use secure coding standards, and run security scans on every code commit.

2. Automate Security Testing


Manual security reviews cannot keep up with the speed of DevOps. Automation is essential.
  • Static Application Security Testing (SAST): Scans your source code for known vulnerabilities (like SQL injection or cross-site scripting) without running the application. Tools: SonarQube, CodeQL.
  • Dynamic Application Security Testing (DAST): Tests the running application from the outside, mimicking an attacker to find runtime vulnerabilities. Tools: OWASP ZAP, Burp Suite.
  • Software Composition Analysis (SCA): Analyzes your application’s dependencies (open-source libraries) for known vulnerabilities. Tools: Snyk, Dependabot, OWASP Dependency-Check.

3. Implement the Principle of Least Privilege (PoLP)


Every user, process, and system should have only the minimum permissions necessary to perform its function.

Action:
  • Developers should not have administrative access to production environments.
  • CI/CD pipelines should use dedicated service accounts with tightly scoped permissions (e.g., a pipeline deploying to a specific AWS S3 bucket should only have s3:PutObject permissions on that bucket).
  • Use Role-Based Access Control (RBAC) to manage permissions.

4. Secure the CI/CD Pipeline


The pipeline itself is a high-value target for attackers. If they compromise the pipeline, they can inject malicious code into your production application.

Action:
  • Lock down pipeline configurations: Require code reviews for any changes to pipeline definition files (e.g., .github/workflows/*.yml).
  • Use code signing: Digitally sign your build artifacts (containers, binaries) to ensure their integrity and origin.
  • Monitor pipeline logs: Look for unauthorized changes or suspicious activity.

Friday, 10 April 2026

How to install and setup Claude Code on MacOS + VS Code

 

Let's follow steps from Quickstart - Claude Code Docs:

% curl -fsSL https://claude.ai/install.sh | bash
Setting up Claude Code...

✔ Claude Code successfully installed!        
                                                                       
  Version: 2.1.100
                                                                       
  Location: ~/.local/bin/claude

  Next: Run claude --help to get started

⚠ Setup notes:
  • Native installation exists but ~/.local/bin is not in your PATH. Run:

  echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.zshrc && source ~/.zshrc


✅ Installation complete!



Let's add path to bin to PATH, add it to zsh config and reload it:

% echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.zshrc && source ~/.zshrc


If you use Bash:

source ~/.bashrc 


Verification:

$HOME/.local/bin is now in $PATH:

% echo $PATH
/Users/bojan/.local/bin:....


Let's check Claude version:

% claude --version
2.1.100 (Claude Code)

Let's also see its CLI arguments:

% claude --help
Usage: claude [options] [command] [prompt]

Claude Code - starts an interactive session by default, use -p/--print for non-interactive output

Arguments:
  prompt                                            Your prompt

Options:
  --add-dir <directories...>                        Additional directories to allow tool access to
  --agent <agent>                                   Agent for the current session. Overrides the 'agent' setting.
  --agents <json>                                   JSON object defining custom agents (e.g. '{"reviewer": {"description": "Reviews code", "prompt": "You are a code
                                                    reviewer"}}')
  --allow-dangerously-skip-permissions              Enable bypassing all permission checks as an option, without it being enabled by default. Recommended only for
                                                    sandboxes with no internet access.
  --allowedTools, --allowed-tools <tools...>        Comma or space-separated list of tool names to allow (e.g. "Bash(git:*) Edit")
  --append-system-prompt <prompt>                   Append a system prompt to the default system prompt
  --bare                                            Minimal mode: skip hooks, LSP, plugin sync, attribution, auto-memory, background prefetches, keychain reads, and
                                                    CLAUDE.md auto-discovery. Sets CLAUDE_CODE_SIMPLE=1. Anthropic auth is strictly ANTHROPIC_API_KEY or apiKeyHelper via
                                                    --settings (OAuth and keychain are never read). 3P providers (Bedrock/Vertex/Foundry) use their own credentials.
                                                    Skills still resolve via /skill-name. Explicitly provide context via: --system-prompt[-file],
                                                    --append-system-prompt[-file], --add-dir (CLAUDE.md dirs), --mcp-config, --settings, --agents, --plugin-dir.
  --betas <betas...>                                Beta headers to include in API requests (API key users only)
  --brief                                           Enable SendUserMessage tool for agent-to-user communication
  --chrome                                          Enable Claude in Chrome integration
  -c, --continue                                    Continue the most recent conversation in the current directory
  --dangerously-skip-permissions                    Bypass all permission checks. Recommended only for sandboxes with no internet access.
  -d, --debug [filter]                              Enable debug mode with optional category filtering (e.g., "api,hooks" or "!1p,!file")
  --debug-file <path>                               Write debug logs to a specific file path (implicitly enables debug mode)
  --disable-slash-commands                          Disable all skills
  --disallowedTools, --disallowed-tools <tools...>  Comma or space-separated list of tool names to deny (e.g. "Bash(git:*) Edit")
  --effort <level>                                  Effort level for the current session (low, medium, high, max)
  --exclude-dynamic-system-prompt-sections          Move per-machine sections (cwd, env info, memory paths, git status) from the system prompt into the first user
                                                    message. Improves cross-user prompt-cache reuse. Only applies with the default system prompt (ignored with
                                                    --system-prompt). (default: false)
  --fallback-model <model>                          Enable automatic fallback to specified model when default model is overloaded (only works with --print)
  --file <specs...>                                 File resources to download at startup. Format: file_id:relative_path (e.g., --file file_abc:doc.txt file_def:img.png)
  --fork-session                                    When resuming, create a new session ID instead of reusing the original (use with --resume or --continue)
  --from-pr [value]                                 Resume a session linked to a PR by PR number/URL, or open interactive picker with optional search term
  -h, --help                                        Display help for command
  --ide                                             Automatically connect to IDE on startup if exactly one valid IDE is available
  --include-hook-events                             Include all hook lifecycle events in the output stream (only works with --output-format=stream-json)
  --include-partial-messages                        Include partial message chunks as they arrive (only works with --print and --output-format=stream-json)
  --input-format <format>                           Input format (only works with --print): "text" (default), or "stream-json" (realtime streaming input) (choices:
                                                    "text", "stream-json")
  --json-schema <schema>                            JSON Schema for structured output validation. Example:
                                                    {"type":"object","properties":{"name":{"type":"string"}},"required":["name"]}
  --max-budget-usd <amount>                         Maximum dollar amount to spend on API calls (only works with --print)
  --mcp-config <configs...>                         Load MCP servers from JSON files or strings (space-separated)
  --mcp-debug                                       [DEPRECATED. Use --debug instead] Enable MCP debug mode (shows MCP server errors)
  --model <model>                                   Model for the current session. Provide an alias for the latest model (e.g. 'sonnet' or 'opus') or a model's full name
                                                    (e.g. 'claude-sonnet-4-6').
  -n, --name <name>                                 Set a display name for this session (shown in /resume and terminal title)
  --no-chrome                                       Disable Claude in Chrome integration
  --no-session-persistence                          Disable session persistence - sessions will not be saved to disk and cannot be resumed (only works with --print)
  --output-format <format>                          Output format (only works with --print): "text" (default), "json" (single result), or "stream-json" (realtime
                                                    streaming) (choices: "text", "json", "stream-json")
  --permission-mode <mode>                          Permission mode to use for the session (choices: "acceptEdits", "auto", "bypassPermissions", "default", "dontAsk",
                                                    "plan")
  --plugin-dir <path>                               Load plugins from a directory for this session only (repeatable: --plugin-dir A --plugin-dir B) (default: [])
  -p, --print                                       Print response and exit (useful for pipes). Note: The workspace trust dialog is skipped when Claude is run with the
                                                    -p mode. Only use this flag in directories you trust.
  --remote-control-session-name-prefix <prefix>     Prefix for auto-generated Remote Control session names (default: hostname)
  --replay-user-messages                            Re-emit user messages from stdin back on stdout for acknowledgment (only works with --input-format=stream-json and
                                                    --output-format=stream-json)
  -r, --resume [value]                              Resume a conversation by session ID, or open interactive picker with optional search term
  --session-id <uuid>                               Use a specific session ID for the conversation (must be a valid UUID)
  --setting-sources <sources>                       Comma-separated list of setting sources to load (user, project, local).
  --settings <file-or-json>                         Path to a settings JSON file or a JSON string to load additional settings from
  --strict-mcp-config                               Only use MCP servers from --mcp-config, ignoring all other MCP configurations
  --system-prompt <prompt>                          System prompt to use for the session
  --tmux                                            Create a tmux session for the worktree (requires --worktree). Uses iTerm2 native panes when available; use
                                                    --tmux=classic for traditional tmux.
  --tools <tools...>                                Specify the list of available tools from the built-in set. Use "" to disable all tools, "default" to use all tools,
                                                    or specify tool names (e.g. "Bash,Edit,Read").
  --verbose                                         Override verbose mode setting from config
  -v, --version                                     Output the version number
  -w, --worktree [name]                             Create a new git worktree for this session (optionally specify a name)

Commands:
  agents [options]                                  List configured agents
  auth                                              Manage authentication
  auto-mode                                         Inspect auto mode classifier configuration
  doctor                                            Check the health of your Claude Code auto-updater. Note: The workspace trust dialog is skipped and stdio servers from
                                                    .mcp.json are spawned for health checks. Only use this command in directories you trust.
  install [options] [target]                        Install Claude Code native build. Use [target] to specify version (stable, latest, or specific version)
  mcp                                               Configure and manage MCP servers
  plugin|plugins                                    Manage Claude Code plugins
  setup-token                                       Set up a long-lived authentication token (requires Claude subscription)
  update|upgrade                                    Check for updates and install if available


And finally, let's launch it:

% claude
Welcome to Claude Code v2.1.100
…………………………………………………………………………………………………………………………………………………………

     *                                       █████▓▓░
                                 *         ███▓░     ░░
            ░░░░░░                        ███▓░
    ░░░   ░░░░░░░░░░                      ███▓░
   ░░░░░░░░░░░░░░░░░░░    *                ██▓░░      ▓
                                             ░▓▓███▓▓░
 *                                 ░░░░
                                 ░░░░░░░░
                               ░░░░░░░░░░░░░░░░
       █████████                                        *
      ██▄█████▄██                        *
       █████████      *
…………………█ █   █ █………………………………………………………………………………………………………………

 Let's get started.

 Choose the text style that looks best with your terminal
 To change this later, run /theme

 ❯ 1. Dark mode ✔
   2. Light mode
   3. Dark mode (colorblind-friendly)
   4. Light mode (colorblind-friendly)
   5. Dark mode (ANSI colors only)
   6. Light mode (ANSI colors only)

 ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌
  1  function greet() {
  2 -  console.log("Hello, World!");                                   
  2 +  console.log("Hello, Claude!");                                  
  3  }
╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌
  Syntax theme: Monokai Extended (ctrl+t to disable)

After that we need to select a login method:


 ❯ 1. Claude account with subscription · Pro, Max, Team, or Enterprise

   2. Anthropic Console account · API usage billing

   3. 3rd-party platform · Amazon Bedrock, Microsoft Foundry, or Vertex AI


Option 1 - Claude Accounts are for the consumer/pro web interface (claude.ai) which is seat-based.

Option 2 - Anthropic Console account should be selected if your organization is on an API plan (pay-as-you-go billing based on token usage). Anthropic Console (platform.claude.com) is the hub for managing API keys, billing, and developer organizations.

Option 3 - 3rd-party platforms are only for when you want to route Claude's "brain" through your existing AWS (Bedrock) or Google Cloud (Vertex) bills.


After selecting Anthropic Console, you'll be taken to page which shows the following:


Claude Code would like to connect to your Anthropic organization MYORG

YOUR ACCOUNT WILL BE USED TO:
    • Generate API keys on your behalf
    • Access your Anthropic profile information
    • Upload files on your behalf

Logged in as user@myorg.com
Switch account


After clicking on Authorize button, you'll be redirected to a page which shows:


Build something great
You’re all set up for Claude Code.

You can now close this window.


Back in terminal, you'll see:

Logged in as user@myorg.com                                           
Login successful. Press Enter to continue…   

After pressing Enter:

 Security notes:                                                        
 1. Claude can make mistakes                                          
    You should always review Claude's responses, especially when       
    running code.                                                                                                                             
 2. Due to prompt injection risks, only use it with code you trust    
    For more details see:                                              
    https://code.claude.com/docs/en/security
                                                                
 Press Enter to continue…   


After clicking on Enter:

Use Claude Code's terminal setup?                                                                                                   
 For the optimal coding experience, enable the recommended settings    
 for your terminal: Shift+Enter for newlines                            
 ❯ 1. Yes, use recommended settings                                    
   2. No, maybe later with /terminal-setup                                                                                                     
 Enter to confirm · Esc to skip   


After choosing 1 - recommended settings:

 Accessing workspace:
                                                     
 /Users/bojan/path/to/project
                                               
 Quick safety check: Is this a project you created or one you trust? (Like your own code, a well-known open source project, or work from your team). If not, take a
 moment to review what's in this folder first.         
                                                      
 Claude Code'll be able to read, edit, and execute files here.
                                                          
 Security guide                                           
                                                         
 ❯ 1. Yes, I trust this folder            
   2. No, exit         
                                                          
 Enter to confirm · Esc to cancel


After selecting 1:

╭─── Claude Code v2.1.100───────────────────────────────────────────────────────────────────────────────────────╮
│                                            │ Tips for getting started                                          
│             Welcome back User!             │ Run /init to create a CLAUDE.md file with instructions for Claude│
│                                            │ ─────────────────────────────────────────────────────────────────│
│                   ▐▛███▜▌                  │ Recent activity                                                  │
│                  ▝▜█████▛▘                 │ No recent activity                                               │
│                    ▘▘ ▝▝                   │                                                                  |
│                                            │                                                                  │
│   Sonnet 4.6 · API Usage Billing · MYORG   │                                                                   
│   ~/…/path/to/project                      │                                                                   
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
                                                          
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────
❯                                         
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  ? for shortcuts                                                                                                                                       ● high · /effort
                                                                                                                                                                          
                                    
We can now run various commands, like:

────────────────────────────────────────────────────────────
❯ /stats                                  
────────────────────────────────────────────────────────────
/stats                  Show your Claude Code usage statistics and activity            
/status                 Show Claude Code status including version, model, account, API connectivity, and tool statuses  
/statusline             Set up Claude Code's status line UI
/ide                    Manage IDE integrations and show status    


If we execute /stats at this point, the output will show:

❯ /stats                                                                
────────────────────────────────────────────────────────────   
Status   Config   Usage   Stats                                        

No stats available yet. Start using Claude Code!   


In my case Status tab showed, among other things:

  IDE: ✘ Error installing VS Code extension: 1: Command failed with ERR_STREAM_PREMATURE_CLOSE: code --force --install-extension anthropic.claude-code
       Premature close
       Please restart your IDE and try again.


I restarted VS Code to no avail. I then manually installed Claude Code for VS Code plugin and restarted VD Code but the same error appeared again. There is a related bug, still with Open status: [BUG] Claude code VS Code extension error in MacOS · Issue #34639 · anthropics/claude-code

If we try /cost:

❯ /stats 
  ⎿  Status dialog dismissed

❯ /cost                                                                
  ⎿  Total cost:            $0.0000
     Total duration (API):  0s                                        
     Total duration (wall): 1h 16m 21s                                 
     Total code changes:    0 lines added, 0 lines removed            
     Usage:                 0 input, 0 output, 0 cache read, 0 cache write