Thursday, 25 June 2026

Elastic Fleet


Elastic Agents do not strictly require Fleet and Fleet Server. You can deploy them in standalone mode, which allows you to manually configure and manage them using local configuration files.

However, running Elastic Agents in Fleet-managed mode with a Fleet Server is the recommended best practice for most enterprise environments.

Here is how the two approaches compare:

1. Fleet-Managed Mode (Recommended)


In this setup, you use the Fleet UI in Kibana to centrally manage agent policies, roll out upgrades, and apply integrations.
  • How it works: Agents connect to a Fleet Server (which is just a specialized Elastic Agent process), which then receives policies from Elasticsearch and pushes them to your endpoints.
  • Best for: Large-scale deployments, continuous monitoring, and Elastic Security/Endpoint integrations.
  • Action: To set this up, refer to the official Elastic Agent Installation Guide.


2. Standalone Mode


In this setup, you manually install the agent and define its inputs, outputs, and integrations directly in a local YAML configuration file.

  • How it works: The agent connects directly to outputs like Elasticsearch or Logstash without a Fleet Server intermediary.
  • Limitations: Central management, automated upgrades, and certain advanced Endpoint Security features are disabled.
  • Best for: Edge cases, highly air-gapped networks, or evaluating specific integrations.
  • Action: To configure this, use the steps outlined in the Standalone Elastic Agent Tutorial.


Example of how an Elastic Fleet can actually be wired


Components & where they live


Everything is in cluster company-prod-elastic-eks (EKS, us-east-2, now on v1.36.2), namespace elastic-system, across 5 nodes in 3 AZs:

Node            AZ  Node Group             Runs
====            ==  ==========             ====
ip-10-99-44-1   2a  es MNG (m7g.2xlarge)   agent
ip-10-99-55-34  2b  default MNG (m5.large) agent + Fleet Server pod
ip-10-99-55-89  2b  es MNG (m7g.2xlarge)   agent
ip-10-99-66-51  2c  es MNG (m7g.2xlarge)   agent
ip-10-99-66-78  2c  default MNG (m5.large) agent


  • Fleet Server — Deployment fleet-server-prod, replicas: 1 (single pod, currently on the 2b node). Listens on :8220 HTTPS. Exposed by a NodePort service fleet-server-prod-agent-http (8220 --> 32202 on every node).
  • Elastic Agents — DaemonSet agent-prod-eck-agent, one pod per node (5 total), spread across all 3 AZs. hostNetwork: true, mode: fleet, FLEET_INSECURE=true. They collect node/pod logs + metrics (hostPath mounts of /var/log/...).
  • ALB k8s-elastics-eckfleet-a1b2c3d4e5 — internal (not internet-facing), spanning subnets in 2a / 2b / 2c. Created by the AWS LB controller from Ingress eck-fleet-server-prod-ingress. Targets = all 5 nodes' NodePort 32202 (instance mode), backend-protocol: HTTPS, idle_timeout: 300s.
  • DNS — eck-fleet-server.internal-domain.local (Route53 via external-dns) -->  the internal ALB. Elasticsearch is a separate endpoint, elasticsearch.internal-domain.local:443.

Who listens, who initiates


The key thing: every connection is initiated by the agent (outbound). Nothing is ever pushed to the agents — that's why this works with agents behind hostNetwork and no inbound rules of their own.

                          

                           ┌───────────────────────────────────────────────┐
                           │ cluster company-prod-elastic-eks / elastic-sys│
     ┌──────────┐          │                                               │
     │ Agent    │ check-in │   internal ALB          Fleet Server (1 pod)  │
     │ DaemonSet│─────────►│  eckfleet-a1b2...   ┌──────► :8220 (HTTPS)    │
     │ (5 pods, │  :443    │  :443 TLS  ─────────┘  NodePort 32202→8220    │
     │  3 AZs)  │  HTTPS   │  (internal cert)       kube-proxy → the 1 pod │
     └────┬─────┘          │       ▲  re-encrypt HTTPS to a node's :32202  │
          │                │       │                     │                 │
          │ ship data      └───────┼─────────────────────┼─────────────────┘
          │ (logs/metrics)         │ DNS                 │ writes .fleet-* / agent
          ▼                   eck-fleet-server.          ▼ metadata, reads policy
 elasticsearch.               internal-domain.local  Elasticsearch + Kibana (Fleet app)
 internal-domain.local:443 ◄──────────────────────  
 (NOT via the fleet ALB)



Traffic Flows: Breakdown


1. Agent --> Fleet Server (Control Plane: Enrollment & Policy Check-in)

The agent acts as the client, and Fleet Server listens on :8220. The agent dials https://eck-fleet-server.internal-domain.local:443:

Traffic Path:

agent --> DNS --> internal ALB :443 (TLS termination, internal cert) --> re-encrypt --> a node's NodePort :32202 -->  kube-proxy --> the single Fleet Server pod :8220

Cross-AZ Routing: 

Because the agent always targets the ALB DNS name (not the pod directly), an agent can land on any node's NodePort. It is then forwarded by kube-proxy to whichever node is hosting the single Fleet Server pod—making cross-AZ hops standard behavior.

The Long Poll Mechanism: The check-in is a long poll. The agent opens a connection, and Fleet Server holds it open for up to ~5 minutes until a policy change occurs or the poll times out.

The Latency "False Positive": This held-open duration is exactly what the ALB records as a TargetResponseTime of ~300s, which triggered the false-positive alert. The ALB's idle_timeout is intentionally set to 300s to support these long polls. Conversely, the health check (/health, expects 404, 300s interval) is a separate lightweight probe, explaining why targets report as perfectly healthy while "latency" metrics look highly skewed.


2. Agent --> Elasticsearch (Data Plane: Telemetry Shipping)

This data path does not traverse the Fleet ALB. The Fleet output configuration points agents directly at https://elasticsearch.internal-domain.local:443.

Consequently, the eckfleet ALB only handles control-plane check-in traffic. This explains why its request volume is incredibly low (~3–5 requests/min) while individual request durations hover around 300s. They are two entirely separate paths utilizing two different load balancers.


3. Fleet Server --> Elasticsearch + Kibana

Fleet Server (via elasticsearchRefs / kibanaRef) acts as the client here. It writes agent and policy metadata directly into the .fleet-* indices and coordinates with the Kibana Fleet application, where the eck-fleet-server and eck-agent policies are managed.


One-Line Summary


Agents (DaemonSet, 5 pods across 2a/2b/2c) long-poll the single Fleet Server pod for policy updates via the internal ALB on :443 --> NodePort 32202 --> :8220, but ship their actual telemetry data directly to Elasticsearch, completely bypassing that ALB. The ALB only processes the slow, held-open control traffic—which is the root cause of the  alarm.



Why does fleet server holds connection open for so long? Why it does not reply immediately?                                                                                                                                                                                   


It's a deliberate design choice called long polling, and it's the opposite of a bug. Here's the reasoning.

The problem Fleet is solving

Fleet Server's job is to push policy changes out to agents promptly — a new integration, a changed log path, an output credential rotation. Agents need to find out "has my policy changed?" with low latency.

There are three ways an agent could learn about changes:

  1. Short polling — agent asks "anything new?" every N seconds, server replies instantly "no", agent sleeps, repeats.
  2. Server push — server opens a connection to each agent when something changes.
  3. Long polling — agent asks "anything new?", and the server holds the request open until either something actually changes or a timeout fires.

Fleet uses #3, and the connection sitting open for ~5 minutes is that hold.

Why not reply immediately (short polling)?


If Fleet answered every check-in instantly with "nothing changed", then to get fast reaction to a policy change, agents would have to poll very frequently — say every few seconds. With your 5 agents that's tolerable, but Fleet is built to manage thousands to tens of thousands of agents from one server. At that scale:

  - Frequent short polls = a constant storm of requests, almost all of which return "no change". Huge wasted CPU/network on both ends.
  - To keep reaction time low you'd poll more often, which makes the storm worse. To cut the storm you'd poll less often, which makes policy changes take longer to land. There's no good setting.

Long polling breaks that trade-off: the agent gets a near-instant reaction to a real change (the server responds the moment policy updates) and there's almost no idle chatter (one held-open connection per agent instead of hundreds of empty round-trips).

Why not server push (#2)?


Pushing would mean the server initiating connections inward to every agent. Agents are all over the place — behind NAT, firewalls, in private subnets, on laptops, on hostNetwork pods like yours. The server usually can't reach them, and you'd need inbound rules everywhere. Long polling flips it: the agent always dials out to the server, the connection is already established and held open, and the server pushes the change down that existing agent-initiated connection the instant it happens. You get push-like latency with poll-like (outbound-only) connectivity. That's exactly why your agents work behind hostNetwork with no inbound exposure.

 So what's actually happening in those ~300s


A check-in is essentially: "Here's my current state; tell me the moment my policy differs." The server parks that request. Two things can end it:

  - A policy change occurs → server responds immediately with the new policy (could be 2 seconds in).
  - Nothing changes → the server lets the request time out at its poll ceiling (~5 min), responds "no change", and the agent immediately opens a fresh one.

Since your policies rarely change, almost every check-in runs the full clock and returns at ~300s. That's the held connection the ALB measures as TargetResponseTime.

Why this specifically fools the ALB


 From the ALB's point of view, "request received → response sent" took 300 seconds, so it reports TargetResponseTime ≈ 300s. The ALB can't tell the difference between "the backend was slow for 300s" (bad) and "the backend intentionally held an idle long-poll for 300s" (normal). That ambiguity is the whole reason your generic 0.8s threshold misfires — and why the right fix (DOP-833) is to exempt this endpoint rather than treat it as latency. It's also why the Ingress sets idle_timeout.timeout_seconds=300: the ALB has to be told to tolerate the held connection, otherwise it would sever the long poll before the agent's poll cycle completes.

In short: Fleet holds the connection open so it can deliver policy changes near-instantly without either hammering the server with empty polls or needing to reach inward to firewalled agents. The ~300s is just an idle long-poll waiting for a change that usually never comes during that window — efficient by design, and only a problem for a latency metric that doesn't know to expect it.


----

AWS EC2: Application Load Balancer

 



An Application Load Balancer (ALB) is a fully managed AWS service that automatically distributes incoming HTTP and HTTPS traffic across multiple backend targets.

It operates at the Application Layer (Layer 7) of the Open Systems Interconnection (OSI) model.

Key Features:

  • Content-Based Routing: Routes traffic based on URL paths (/api vs /images) or hostnames (://example.com).
  • Container Support: Integrates directly with Amazon ECS and EKS using dynamic port mapping.
  • Advanced Protocols: Native support for modern protocols like HTTP/2, gRPC, and WebSockets.
  • Security Integration: Features built-in HTTPS/TLS termination and integrates directly with AWS WAF for web security.

How Components Work Together


  • Listener: Evaluates connection requests from clients using protocols and ports you configure.
  • Rules: Determines how the load balancer routes requests to its registered targets.
  • Target Group: Groups backend resources (like EC2 instances, containers, or IP addresses) that receive the traffic

How ALB health checks keep applications online?


An Application Load Balancer (ALB) keeps your application online by continuously monitoring the health of your backend targets and dynamically redirecting traffic away from failing nodes.

1. Automatic Traffic Redirection


The ALB sends periodic ping requests (health checks) to every registered target. If a target fails to respond correctly, the ALB marks it as unhealthy and immediately stops sending user traffic to it. Traffic is rerouted to the remaining healthy nodes with zero downtime for the user.

2. Auto Scaling Integration


When paired with an Auto Scaling Group (ASG), ALB health checks can trigger the automatic replacement of broken instances.
  • The Problem: An EC2 instance might be running (healthy at the hardware level), but the web server inside it has crashed (unhealthy at the application level).
  • The Solution: The ALB tells the ASG that the instance is failing application health checks. The ASG terminates that specific broken instance and launches a fresh, working one.

3. Graceful Recovery


When an unhealthy instance recovers, or when a new instance is launched, the ALB does not send traffic to it immediately. It enters an initial state and undergoes consecutive successful health checks. Only when it passes the threshold does the ALB safely introduce it back into the traffic rotation.

How to Configure an ALB Health Check


You configure health checks inside the Target Group settings using these parameters:

Parameter                  What it does                                                                          Recommended Setting
========                   =========                                                                           =================
Health Check Path     The URL endpoint the ALB hits 
                                      (e.g., /health or /index.html).                                                    /health
Healthy Threshold      Consecutive successes needed to mark a target as healthy.      3
Unhealthy Threshold  Consecutive failures needed to mark a target as unhealthy.     2
Timeout                        How long the ALB waits for a response before failing.           5 seconds
Interval                         The time between individual health check pings.                    30 seconds
Success Codes              The HTTP status codes that prove the app is working.            200 (or 200-399)



When you configure an ALB, you do not select an Availability Zone (AZ) directly; instead, you must select at least two subnets in different Availability Zones to ensure high availability.

  • How it works: AWS places a load balancer node in each of the specified subnets.
  • The AZ link: Because each subnet belongs to exactly one AZ, this fundamentally binds the ALB's nodes to those corresponding Availability Zones.
  • Custom routing: You can modify the subnets via the Update Availability Zones settings in the EC2 Console at any time.

Public ALB


Binding an ALB to public subnets makes it a public (internet-facing) load balancer. 

When you create an internet-facing ALB, AWS requires you to select public subnets so the ALB nodes can receive a public IP address and route traffic from the internet.

Key Characteristics:

  • Public DNS: The ALB receives a public DNS name that resolves to public IP addresses.
  • Internet Gateway: The selected public subnets must have a route to an Internet Gateway (IGW) in their route tables.
  • Target Routing: Even though the ALB is public, it can still route traffic to EC2 instances living in private subnets
An internet-facing ALB routes traffic directly to the individual backend targets (such as EC2 instances or IP addresses), not to the private subnets themselves.


How Routing Works

  • Target Group Config: You configure the ALB to route traffic to a Target Group.
  • Direct Node Communication: The ALB nodes in the public subnets communicate directly with the private IP addresses of your backend nodes.
  • Cross-Subnet Traffic: AWS handles this routing internally via the VPC router, allowing the public ALB to securely traverse into private subnets.

Configuration Checklist

  • VPC: Both the public subnets (where the ALB lives) and the private subnets (where the nodes live) must be in the same VPC.
  • Security Groups: The private instances must have a security group that allows inbound traffic from the ALB's security group



Private ALB



An internal (private) ALB routes traffic in the exact same way as a public ALB, but it is only accessible within your VPC or connected networks.

It routes traffic directly to individual backend targets, not to subnets.

Key Characteristics

  • Private Subnets: You deploy the ALB nodes into private subnets.
  • Private DNS: The ALB receives a public DNS name, but it resolves exclusively to private IP addresses.
  • No Internet Access: It cannot receive any traffic from the public internet because it lacks a public IP.

Common Use Cases

  • Internal Microservices: Routing traffic from a public-facing web tier to a private backend API tier.
  • Hybrid Networks: Routing traffic coming from an on-premises data centre via AWS Direct Connect or a VPN

Setting Up ALB in AWS Console


AWS Elastic Load Balancing shows basic building blocks of AWS Load Balancer which include listeners and target groups. 

To create Application Load Balancer go to EC2 >> Load balancers >> Create Load balancer >> Select load balancer type (click on Create under Application Load Balancer)



Here we can set:

  • Basic configuration
    • Name
    • Scheme (cannot be changed after the load balancer is created)
      • Internet-facing. An internet-facing load balancer routes requests from clients over the internet to targets. Requires a public subnet. 
      • Internal. An internal load balancer routes requests from clients to targets using private IP addresses.
    • IP address type. Select the type of IP addresses that your subnets use.
      • IPv4. Recommended for internal load balancers.
      • Dualstack. Includes IPv4 and IPv6 addresses.
  • Network mapping. The load balancer routes traffic to targets in the selected subnets, and in accordance with your IP address settings.
    • VPC. Virtual private cloud for your targets. If balancer is internet-facing, only VPCs with an internet gateway are enabled for selection. The selected VPC cannot be changed after the load balancer is created. As VPC is region-specific so is Application Load Balancer.
    • Mappings. Once VPC is selected, its availability zones are listed here and are selectable. Select at least two Availability Zones and one subnet per zone. The load balancer routes traffic to targets in these Availability Zones only. Availability Zones that are not supported by the load balancer or the VPC are not available for selection. We should select all AZs that we listed in the Auto scaling group (if we used it).
  • Security groups. A security group is a set of firewall rules that control the traffic to your load balancer. We can select up to 10 security groups.
    • If our application is listening for HTTP requests on port 80 we should select a security group with:
      • Inbound rule: accept HTTP/TCP traffic on port 80 with source Anywhere-IPv4
      • Outbound rule: allow all traffic for all protocols and port ranges to custom destination 0.0.0.0/0
  • Listeners and routing. A listener is a process that checks for connection requests using the port and protocol you configure. The rules that you define for a listener determine how the load balancer routes requests to its registered targets.
    • Add listener
      • Protocol e.g. HTTP
      • Port e.g. 80. This is a public facing port and it does not need to be the same as the port from the attached target group. E.g. LB can listen on port 80 and forward traffic to target group port 8080.
      • Default action: Forward to (select a target group)
      • Add listener tags
  • Add-on services - optional
    • AWS Global Accelerator
  • Tags - optional


More info on Scheme, from AWS documentation:

When you create a load balancer, you must choose whether to make it an internal load balancer or an internet-facing load balancer.

The nodes of an internet-facing load balancer have public IP addresses.

The nodes of an internal load balancer have only private IP addresses.

Both internet-facing and internal load balancers route requests to your targets using private IP addresses. Therefore, your targets do not need public IP addresses to receive requests from an internal or an internet-facing load balancer.

More info on how ALB routes traffic to multiple Availability Zones (and about what Load Balancer Nodes are):

When you enable an Availability Zone for your load balancer, Elastic Load Balancing creates a load balancer node in the Availability Zone. 

The nodes for your load balancer distribute requests from clients to registered targets. When cross-zone load balancing is enabled, each load balancer node distributes traffic across the registered targets in all enabled Availability Zones. When cross-zone load balancing is disabled, each load balancer node distributes traffic only across the registered targets in its Availability Zone.

Before a client sends a request to your load balancer, it resolves the load balancer's domain name using a Domain Name System (DNS) server. The DNS entry is controlled by Amazon, because your load balancers are in the amazonaws.com domain. The Amazon DNS servers return one or more IP addresses to the client. These are the IP addresses of the load balancer nodes for your load balancer.

As traffic to your application changes over time, Elastic Load Balancing scales your load balancer and updates the DNS entry. The DNS entry also specifies the time-to-live (TTL) of 60 seconds. This helps ensure that the IP addresses can be remapped quickly in response to changing traffic.

The client determines which IP address to use to send requests to the load balancer. The load balancer node that receives the request selects a healthy registered target and sends the request to the target using its private IP address.

With Application Load Balancers, the load balancer node that receives the request uses the following process:

1) Evaluates the listener rules in priority order to determine which rule to apply.

2) Selects a target from the target group for the rule action, using the routing algorithm configured for the target group. The default routing algorithm is round robin. Routing is performed independently for each target group, even when a target is registered with multiple target groups.

For further info: How Elastic Load Balancing works - Elastic Load Balancing

ALB nodes use Elastic Network Interface (Elastic network interfaces - Amazon Elastic Compute Cloud) which has public IP address:

At least one ENI is created and attached to the balancer in each availability zone where the balancer is deployed (except NLB, which should only have one per AZ). Over the life of the balancer, new ENIs will appear and old ones will disappear, as the balancer scales horizontally (number of nodes) and/or vertically (capacity of underlying hardware), all of which is handled transparently by the infrastructure. Even though you can tag them, the tagging will become stale over time.

Source: amazon web services - AWS - Affect Load Balancer's tags to its Network Interfaces (ENI) - Stack Overflow

 

You can determine the IP addresses associated with an internal load balancer or an internet-facing load balancer by resolving the DNS name of the load balancer. These are the IP addresses where the clients should send the requests that are destined for the load balancer. However, Classic Load Balancers and Application Load Balancers use the private IP addresses associated with their elastic network interfaces as the source IP address for requests forwarded to your web servers.

Source: Find the IP address used by a load balancer to forward traffic to web servers

 

Load balancer routes requests to the targets in a target group and performs health checks on the targets. Target group is accepting requests from the load balancer and forwards them to targets. These targets can be e.g. EC2 instances created either manually or through auto scaling group.

How to create a Target Group used by Load Balancer listeners? (This applies for any type of Load Balancer)

EC2 >> Target groups >> Create target group

Step 1: Specify group details

 

Here we can set:

  • Basic configuration. Settings in this section cannot be changed after the target group is created.
    • Target type
      • Instances
        • Supports load balancing to instances within a specific VPC.
        • Facilitates the use of Amazon EC2 Auto Scaling  to manage and scale your EC2 capacity.
      • IP addresses
        • Supports load balancing to VPC and on-premises resources.
        • Facilitates routing to multiple IP addresses and network interfaces on the same instance.
        • Offers flexibility with microservice based architectures, simplifying inter-application communication.
        •  Supports IPv6 targets, enabling end-to-end IPv6 communication, and IPv4-to-IPv6 NAT.
      • Lambda function
        • Facilitates routing to a single Lambda function.
        •  Accessible to Application Load Balancers only.
      • Application Load Balancer
        • Offers the flexibility for a Network Load Balancer to accept and route TCP requests within a specific VPC
        • Facilitates using static IP addresses and PrivateLink with an Application Load Balancer.
    • Target group name
    • Protocol:Port e.g. If our application is accepting HTTP requests on port 8080 this would be HTTP:8080
    • VPC - VPC with the instances that you want to include in the target group.
    • Protocol version
      • HTTP1. Send requests to targets using HTTP/1.1. Supported when the request protocol is HTTP/1.1 or HTTP/2.
      • HTTP2. Send requests to targets using HTTP/2. Supported when the request protocol is HTTP/2 or gRPC, but gRPC-specific features are not available.
      • gRPC. Send requests to targets using gRPC. Supported when the request protocol is gRPC.
  • Health checks. The associated load balancer periodically sends requests, per the settings below, to the registered targets to test their status.
    • Health check protocol
      • HTTP
      • HTTPS
    • Health check path. Use the default path of “/“ to ping the root, or specify a custom path if preferred.
    • Advanced health check settings
      • Port. The port the load balancer uses when performing health checks on targets. The default is the port on which each target receives traffic from the load balancer, but you can specify a different port.
        • Traffic port
        • Override
      • Healthy threshold. The number of consecutive health checks successes required before considering an unhealthy target healthy.
      • Unhealthy threshold. The number of consecutive health check failures required before considering a target unhealthy.
      • Timeout. The amount of time, in seconds, during which no response means a failed health check.
      • Interval. The approximate amount of time between health checks of an individual target
      • Success codes. The HTTP codes to use when checking for a successful response from a target. You can specify multiple values (for example, "200,202") or a range of values (for example, "200-299").
  • Attributes
  • Tags - optional


Step 2: Register targets

This is an optional step to create a target group. However, to ensure that your load balancer routes traffic to this target group you must register your targets.




After load balancer is created it takes several minutes while it's in provisioning state and get into active state. After this, we can use its DNS name in order to see what it's doing.

If we copy its DNS name and paste it to our browser, if we haven't registered any targets in the target group associated with the load balancer, we'll get error 503 - Service Temporary Unavailable.

If we've registered targets and are getting error 504 Gateway time-out, we should check first if security groups (firewalls) for our EC2 instances (inbound rule - source IP range) are set up correctly as this error usually indicates that inbound traffic is not allowed.

AWS Terraform provider offers provisioning all these resources:

 
How is AWS Application Load Balancing usually implemented?
 
Let's say we have our application running on 3 EC2 instances where 2 are in the same region e.g. us-west-2 but in separate availability zones e.g. us-west-2a and us-west-2b. Third EC2 instance is in eu-central-1, in availability zone eu-central-1a.
 
VPC is region-specific but can span multiple availability zones (AZ). 
Subnet is an IP address range within VPC.
VPC can have public and private subnets.
VPC can be divided into multiple subnets but each subnet is AZ-specific.
AZ can have multiple subnets.

So, all EC2 instances belong to the same VPC but, as they are in different AZs, each of them belongs to different subnet.
 
Load balancer must be in the public subnet of VPC as clients communicate with load balancer via internet (public network).
 
Load balancer does not get associated directly with EC2 instances but subnets:

resource "aws_lb" "test" {
    subnets = ["subnet-0001", "subnet-0002"] 
    ...
}

Target group is associated with VPC:
 
resource "aws_alb_target_group" "test" {
    vpc_id   = var.vpc_id
    ...
}
 
 

Difference between ALB and NLB (Network Load Balancer)


An Application Load Balancer (ALB) and a Network Load Balancer (NLB) serve different purposes based on the layer of the network they operate on and the type of traffic they handle.

The core difference is that an ALB understands application-level traffic (Layer 7) like HTTP/HTTPS headers, while an NLB handles low-level network traffic (Layer 4) like TCP/UDP packets at extreme speeds.

Direct Comparison Matrix


Feature          Application Load Balancer (ALB)                     Network Load Balancer (NLB)
======         =========================                      =======================
OSI Layer     Layer 7 (Application)                                            Layer 4 (Transport)
Protocols       HTTP, HTTPS, HTTP/2, gRPC, WebSockets      TCP, UDP, TLS
IP Addresses 
                       Dynamic IPs (Changes automatically; requires a DNS name)  
                                                                                                     Static IPs (Can assign an Elastic IP per AZ)
Routing Features  
                        Advanced (Path, Host, Query parameters, Headers)  
                                                                                                      Basic (Port and IP protocol routing only)
Performance  
                        Optimized for complex web apps (Millions of requests/sec)  
                                                                                                      Optimized for ultra-low latency (Billions of requests/sec)


Key Technical Differences


1. Smart Routing vs. Raw Speed


  • ALB (Smart): Can read the contents of your HTTP requests. It can route traffic bound for ://example.com to an API server cluster, and traffic for ://example.com to a storage cluster.
  • NLB (Fast): Does not look inside the data packet. It simply looks at the target port and forwards the packet instantly. This results in ultra-low latency (measured in milliseconds).

2. IP Addresses and DNS


  • ALB: Scale out dynamically by adding or removing nodes. This causes its underlying IP addresses to change frequently. You must always point your domain name to the ALB's DNS Name, never to a static IP.
  • NLB: Gives you a Static IP address per Availability Zone. You can also assign your own Elastic IP addresses. This is critical if your corporate clients need to whitelist specific, unchanging IPs in their firewalls.

3. Client IP Preservation


  • ALB: Terminates the connection and makes a new one to your backend instances. The backend see the ALB's private IP. To find the real user's IP, your code must read the X-Forwarded-For HTTP header.
  • NLB: Passes the original TCP packet straight through to your backend server. Your backend instances see the original source IP address of the client natively, without needing extra headers.


When to Choose Which?


Choose an ALB if you are building:
  • Standard web applications and microservices.
  • Containerized apps (ECS/EKS) requiring path-based or host-based routing.
  • Applications requiring tight integration with AWS Web Application Firewall (WAF).

Choose an NLB if you are building:
  • Non-HTTP applications (e.g., gaming servers, SFTP, MQTT, database clusters).
  • Architectures requiring fixed, static IP addresses or Elastic IPs.
  • High-frequency financial applications where sub-millisecond network latency is a hard requirement

Which alerts should typically be set for AWS ALB?


To keep your applications highly available, you should set up Amazon CloudWatch alarms for a mix of availability, performance, and target health metrics.

The most critical metrics to monitor for an AWS ALB are grouped by priority below:

1. High Priority (Critical Infrastructure Impact)

UnHealthyHostCount (Per Target Group)
What it means: The number of backend instances failing health checks Target Group Metrics.
Alert Threshold: > 0 (or > 1 for larger clusters).
Why it matters: Signals that your servers are crashing or cannot handle traffic.

HTTPCode_Target_5XX_Count
What it means: The number of 5xx server error codes generated by your backend application ALB Metrics.
Alert Threshold: Depends on baseline traffic, typically > 5 failures within a 1-minute to 5-minute window.
Why it matters: Indicates server crashes, database connection timeouts, or unhandled exceptions in your application code.

HTTPCode_ELB_5XX_Count
What it means: The number of 5xx errors generated directly by the ALB itself (not your servers) ALB Metrics.
Alert Threshold: > 0.
Why it matters: Usually means the ALB cannot find any healthy hosts, or it is experiencing a configuration mismatch (e.g., bad TLS handshake with the target).

2. Medium Priority (Performance & User Experience)

TargetResponseTime
What it means: The time elapsed (in seconds) from when the ALB sent the request to the target until the target started responding ALB Metrics.
Alert Threshold: Use the p95 or p99 statistic. Alert if it exceeds your application’s maximum acceptable latency (e.g., > 2.0 seconds).
Why it matters: Users are experiencing severe application slowdowns, likely due to high CPU/memory usage on your instances.

RejectedConnectionCount
What it means: The load balancer is rejecting connections because it has reached its maximum capacity ALB Metrics.
Alert Threshold: > 0.
Why it matters: Your application is getting sudden traffic spikes and the ALB cannot scale fast enough, or backend targets are failing to keep up.

3. Low Priority (Anomalies & Security)

HTTPCode_Target_4XX_Count
What it means: The number of 4xx client errors (like 404 Not Found or 401 Unauthorized) returned by backend targets ALB Metrics.
Alert Threshold: A significant spike above your standard baseline.
Why it matters: A sudden surge might indicate a broken frontend deployment, a bad API update, or a malicious entity scanning your network for vulnerabilities.

Summary Checklist for CloudWatch Alarms

Metric Name                               Statistic         Recommended       Suggested                      Action
                                                                            Period                     Threshold     
==========                              ======          ===========       ========                     =====
UnHealthyHostCount                    Maximum   1 Minute                         > 0                          Page/On-Call
HTTPCode_ELB_5XX_Count     Sum             1 Minute                         > 0                          Page/On-Call
HTTPCode_Target_5XX_Count  Sum            5 Minutes                > 10 (or > 1% of traffic) Ticket/Slack
TargetResponseTime                     p95              5 Minutes> [Your Limit]                                Ticket/Slack





Resources:


Wednesday, 3 June 2026

Some notes on LLMs

 

What does e.g. 1M context mean in a model description?


"1M context" means the model can handle up to 1 million tokens in its context window — the combined input (your messages, files, tool results, system prompt) plus output it can consider at once. For reference, that's roughly 750,000 words or a few thousand pages of text, letting it work with very large codebases or long conversations without losing earlier content.

What is a LLM's tool-use loop? 


An LLM tool-use loop is the pattern where you let a language model drive an investigation by repeatedly choosing tools to call, rather than answering in one shot.

The shape


  1. Send: system prompt + user request + list of available tools (with JSON schemas)
  2. Model responds with either:
       (a) a final text answer  -> exit loop
       (b) a "tool_use" block: { name: "run_aws_cli", input: { args: [...] } }
  3. Your code executes that tool, captures the result
  4. Append the tool result to the conversation as a "tool_result" message
  5. Send the whole conversation back to the model
  6. Goto 2

The model never executes anything itself — it just emits requests to call tools. Your code is the runtime that actually runs them and feeds the output back.

Why it's a loop

Each turn the model sees everything it has learned so far (prior tool calls + their outputs) and decides the next step based on that. So a real run looks like:

  - Turn 1: model calls cloudwatch describe-alarms --state-value ALARM
  - Turn 2: sees 3 alarms, picks the noisiest, calls logs filter-log-events for that log group around the alarm time
  - Turn 3: sees an error pattern, calls kubectl describe pod on the affected workload
  - Turn 4: emits final Markdown report, no tool call → loop exits

  The model is doing the planning; your code is the dispatcher.

Why you need a budget

Without limits the loop can spin forever — the model keeps finding "one more thing to check." Hence in agent.run():

  - max_iterations=30 — hard cap on turns
  - max_tokens_per_turn=12288 — cap on a single response
  - Per-tool wall-clock timeouts (60 s for CLI, 30 s for HTTP)
  - Output truncation (50 000 char stdout) so a giant tool result doesn't blow the context window

How it ends

The loop terminates when the model returns a response with no tool_use block — that's the "I'm done, here's the answer" signal (stop_reason: end_turn). Or when you hit a budget limit and force-stop it.

Where the safety lives

Because the model can ask for arbitrary tool calls, the loop is only as safe as the tool implementations. That's why when implementing agents we should have the allowlists (services, verbs, paths) - the model can request aws s3 rm, but the validator rejects it before subprocess.run ever sees it.

The "two-pass" design in agent is a refinement: pass 1 is a tool-use loop (gather), pass 2 is a single non-loop call (synthesize). Splitting them lets each prompt focus on one job.


What are those .md files used by AI Agents?


There isn't a universally agreed official name, but people commonly refer to files like CLAUDE.md, GEMINI.md, AGENTS.md, COPILOT_INSTRUCTIONS.md, and .cursorrules as:

  • AI agent instruction files (most generic)
  • Agent configuration files
  • Agent context files
  • LLM instruction files
  • Repository AI instructions
  • Project AI guidelines

In the developer tooling community, "agent instructions" or "agent context files" are probably the most widely understood umbrella terms.

For example:

Tool             File
----               -----
Claude Code       CLAUDE.md
Gemini CLI       GEMINI.md
GitHub Copilot   .github/copilot-instructions.md
OpenAI Codex CLI   AGENTS.md
Cursor             .cursorrules / project rules
Windsurf          Rules files

Collectively, you could describe them as:

"Repository-level AI agent instruction files that provide persistent context and operating rules for coding assistants."

If you're building tooling around them (e.g., in your DevOps work), I'd recommend using "agent instructions" as the generic term because it's vendor-neutral and easily understood across Claude, Gemini, Copilot, Cursor, Codex, and similar tools.



Introduction to Claude by Anthropic




How to give Claude an instruction to apply label "DevOps" whenever it creates a new Linear ticket?



It depends on whether you're creating Linear tickets via:
  • Linear MCP,
  • a custom /create-ticket command,
  • or just asking Claude in chat to create tickets
The best location depends on which of those you're using.


If you're just asking Claude in chat to create tickets: if you want this behavior for all projects, put it in:

~/.claude/CLAUDE.md

If you want it only for a specific repository/project, put it in:

<repo-root>/CLAUDE.md

If you want it only for yourself in a specific project (without committing it to git), put it in:

<repo-root>/CLAUDE.local.md

CLAUDE.local.md is usually the best place for personal workflow preferences like Linear labels. Claude loads it after the project-level file, so your preference is read later and tends to have more influence.

For example:

## Linear Ticket Creation

When creating new Linear issues:

- Always apply the "DevOps" label.
- If multiple labels are appropriate, include "DevOps" in addition to the others.
- Verify the label exists before creating the issue.

If you're creating tickets through a custom slash command (e.g. /create-ticket) or a Linear MCP workflow, an even more reliable approach is to put the instruction directly in the command definition under:

~/.claude/commands/create-ticket.md

or in a dedicated skill, because the instruction is then attached specifically to the ticket-creation workflow rather than relying on general memory. Many Claude Code users find workflow-specific commands more reliable than large global CLAUDE.md files for things like Linear automation.


How to instruct Claude to use feature branch name from a Linear ticket when implementing it?


Add a specific workflow rule to CLAUDE.md rather than a vague instruction. Claude follows procedural instructions much more reliably.

For example:

## Linear Ticket Workflow

When implementing work from a Linear ticket:

1. Retrieve the ticket details from Linear.
2. Check whether the ticket has a suggested branch name.
3. Before making any code changes, switch to or create a git branch using the Linear branch name.
4. Do not create an alternative branch name unless the Linear branch name is unavailable.
5. If the branch already exists remotely, check it out instead of creating a new branch.
6. Confirm the branch name being used before starting implementation.

Example:

Linear ticket: DEV-123 Add deployment health checks
Branch: bojan/dev-123-add-deployment-health-checks

Commands:

git fetch origin
git checkout bojan/dev-123-add-deployment-health-checks || \
git checkout -b bojan/dev-123-add-deployment-health-checks

If you're using the Linear MCP server, I'd make it even more explicit:

## Branch Naming

For any task originating from Linear:

- Always obtain the branch name from the Linear ticket metadata.
- Never invent a branch name if Linear provides one.
- Use the Linear-provided branch name exactly as written.
- Checkout the branch before planning or implementing changes.

An even better approach is to create a dedicated command such as .claude/commands/implement-ticket.md:

Given a Linear ticket ID:

1. Load the ticket.
2. Read the ticket's branch name.
3. Checkout the branch.
4. Implement the work.
5. Commit using the ticket ID in the commit message.

Workflow-specific commands tend to be more reliable than relying on global instructions in CLAUDE.md, especially for things like branch creation, ticket labeling, and PR naming.


If I update ~/.claude/CLAUDE.md, do I need to start the new Claude session?


Usually no restart is strictly required, but it depends on how you’re using Claude:

If you’re using Claude Code / CLI (most common case)


~/.claude/CLAUDE.md is typically read when a new session starts

So:
✅ New terminal session / new Claude run → picks up changes
⚠️ Already-running interactive session → may not reliably reload it

Best practice: restart the Claude session (or re-open the CLI/chat) after changing it if you want to be sure the new rules apply.

If you’re using a long-running agent session


(e.g. an ongoing “Claude is working in the repo” process)

It often caches the instructions in memory at session start
So updates won’t consistently apply mid-task

If you’re using repo-level CLAUDE.md


Same rule: loaded at session start or project initialization
Changes apply reliably on next session or re-attach to repo

Practical rule of thumb


Changed CLAUDE.md → start a new Claude session to guarantee it applies
Minor edits while actively working → assume they won’t fully apply until restart


How to add auto-generated instructions to global context?

During Claude Code session, it created two memories but they were bound to the current project only:

Write(/Users/bojan/.claude/projects/-Users-bojan-repos-Company-Project/memory/gh-token-unset.md)
  ⎿  Wrote 10 lines to /Users/bojan/.claude/projects/-Users-bojan-repos-Company-Project/memory/gh-token-unset.md
       1 ---
       2 name: gh-token-unset
       3 description: gh CLI fails with HTTP 401 Bad credentials unless GH_TOKEN is unset first
       4 metadata:
       5   node_type: memory
       6   type: reference
       7   originSessionId: 9xxx6
       8 ---
       9
      10 In this environment `gh` fails with `HTTP 401: Bad credentials` because a stale `GH_TOKEN` env var overrides the working keyring auth. Run `unset GH_TOKEN` in the same Bash command before any `gh`
          call — the keyring account (`BojanKomazec`, scopes gist/read:org/repo) then works. Env doesn't persist between Bash calls, so include the `unset` each time.

⏺ Write(/Users/bojan/.claude/projects/-Users-bojan-repos-Company-Project/memory/precommit-hook-needs-no-verify.md)
  ⎿  Wrote 10 lines to /Users/bojan/.claude/projects/-Users-bojan-repos-Company-Project/memory/precommit-hook-needs-no-verify.md
       1 ---
       2 name: precommit-hook-needs-no-verify
       3 description: git commit aborts in non-interactive shell; the pre-commit hook prompts on /dev/tty
       4 metadata:
       5   node_type: memory
       6   type: reference
       7   originSessionId: 9xxx6
       8 ---
       9
      10 The global pre-commit hook (`~/.git-templates/hooks/pre-commit`) prompts interactively via `/dev/tty` ("view diff? proceed with commit?"). In a non-interactive shell it errors `/dev/tty: Device no
         t configured` and aborts the commit. Use `git commit --no-verify` to bypass it when committing programmatically.

⏺ Write(/Users/bojan/.claude/projects/-Users-bojan-repos-Company-Project/memory/MEMORY.md)
  ⎿  Wrote 2 lines to /Users/bojan/.claude/projects/-Users-bojan-repos-Company-Project/memory/MEMORY.md
      1 - [gh token unset](gh-token-unset.md) — `unset GH_TOKEN` before any `gh` command or it 401s
      2 - [pre-commit hook needs --no-verify](precommit-hook-needs-no-verify.md) — commit hook prompts on /dev/tty, aborts non-interactively

What you’re looking at in:

~/.claude/projects/.../memory/MEMORY.md

is project-scoped learned memory (auto-generated runtime notes), not the best place to define global rules.

To make something apply to all projects, you should put it in:

~/.claude/CLAUDE.md

How to structure it properly


Think of ~/.claude/CLAUDE.md as your global “agent operating system”.

You can directly migrate those MEMORY.md rules into it like this:

## Global DevOps / CLI Behavior Rules

These rules apply to all projects unless explicitly overridden.

### GitHub CLI
- Always run `unset GH_TOKEN` before executing any `gh` command.
- Reason: GH_TOKEN causes 401 errors in local CLI authentication.

### Git commits
- When committing changes, always use `--no-verify`.
- Reason: pre-commit hooks may attempt interactive prompts via /dev/tty and break non-interactive runs.


Important distinction (this is the key part)


File               Scope                 Who writes it Purpose
----                  -----                  -------------    --------
~/.claude/CLAUDE.md  Global (all projects) You             Your permanent rules
<repo>/CLAUDE.md      Per project             You/team         Repo-specific rules
~/.claude/projects/.../MEMORY.md Auto-generated Claude     Learned behavior, not authoritative

👉 Never manually “promote” MEMORY.md as your source of truth
Instead, copy stable patterns into ~/.claude/CLAUDE.md.

For your Linear branch-name requirement


You’d add it here too:

## Linear Workflow (Global)

When working from Linear tickets:
- Always use the branch name provided by Linear metadata.
- Never generate custom branch names.
- Always checkout the Linear branch before implementation.


One extra improvement (worth doing)


If you want this to be really robust, structure your global file like:

~/.claude/CLAUDE.md
├── Git / GitHub rules
├── Linear workflow rules
├── Shell safety rules
├── CI/CD rules

Claude responds better to clear domains than long flat lists.









Thursday, 14 May 2026

Introduction to Checkly



How Checkly works

Checkly is a SaaS synthetic monitoring platform — you define "checks" (HTTP requests or browser scripts), Checkly runs them on a schedule from probe locations around the world (or on-demand from CI), records latency/assertions/screenshots, and alerts you when they fail or get slow.
  
  Two main check types:

  - API checks — a single HTTP request with assertions on status, headers, body, response time.
  - Browser checks — a Playwright script run in a real headless Chromium against your deployed app.

There's also multi-step API checks (chain requests, e.g. login → use token → logout) and heartbeat checks (your job pings Checkly; alert if it stops).

Hearbeat vs Ping 

Heartbeats and pings are both vital network failure-detection mechanisms, but they differ in purpose: Heartbeats are proactive, periodic "I am alive" messages sent by an application to signal it is healthy, while Pings are reactive requests to check if a server is reachable. Heartbeats detect application crashes, while pings detect network downtime.

Checks are typically authored as code (Checkly CLI, TypeScript) and checkly deploy'd to the cloud. You can tag them (tags: ["auth"]), parametrise them with env vars like ENVIRONMENT_URL, and trigger them on-demand from CI — which is exactly what this PR does with npx checkly trigger --tags=auth.
  
  Runtime model:
  - Scheduled: every N minutes from chosen regions (e.g. us-east-2, eu-west-1) — catches regressions/outages between deploys.
  - Triggered from CI: post-deploy smoke test, results gate (or just annotate) the deploy.
  - Alerts: Slack/PagerDuty/email on failure, with retry/degraded thresholds to avoid flap.

  ---

  What it would check for this auth API
  
  Given the auth API's surface (login, OAuth, JWT issuance, admin endpoints), realistic auth-tagged checks:

  1. Health endpoint — basic liveness

  new ApiCheck("auth-health", {
    name: "Auth API – health",
    tags: ["auth"],
    frequency: 1, // minute
    locations: ["us-east-2", "eu-west-1"],
    request: {
      url: `${process.env.ENVIRONMENT_URL}/health`,
      method: "GET",
      assertions: [
        AssertionBuilder.statusCode().equals(200),
        AssertionBuilder.responseTime().lessThan(500),
        AssertionBuilder.jsonBody("$.status").equals("ok"),
      ],
    },
  });

  2. Login flow — happy path, returns a JWT

  new ApiCheck("auth-login", {
    name: "Auth API – login returns JWT",
    tags: ["auth"],
    request: {
      url: `${process.env.ENVIRONMENT_URL}/auth/login`,
      method: "POST",
      headers: [{ key: "Content-Type", value: "application/json" }],
      body: JSON.stringify({
        email: process.env.SYNTHETIC_USER_EMAIL,
        password: process.env.SYNTHETIC_USER_PASSWORD,
      }),
      assertions: [
        AssertionBuilder.statusCode().equals(200),
        AssertionBuilder.responseTime().lessThan(1500),
        AssertionBuilder.jsonBody("$.token").isNotNull(),
        // structural check on JWT shape
        AssertionBuilder.jsonBody("$.token").matches("^eyJ[A-Za-z0-9_-]+\\.[A-Za-z0-9_-]+\\.[A-Za-z0-9_-]+$"),
      ],
    },
  });
  
  3. Login — wrong password returns 401 (negative path)

  Catches the "accidentally accepts anything" class of regression.

  new ApiCheck("auth-login-bad-pw", {
    name: "Auth API – wrong password = 401",
    tags: ["auth"],
    request: {
      url: `${process.env.ENVIRONMENT_URL}/auth/login`,
      method: "POST",
      headers: [{ key: "Content-Type", value: "application/json" }],
      body: JSON.stringify({ email: process.env.SYNTHETIC_USER_EMAIL, password: "wrong" }),
      assertions: [AssertionBuilder.statusCode().equals(401)],
    },
  });
  
  4. Multi-step — login then call protected endpoint

  This is the most useful kind for an auth API, because it proves the token actually works.

  new MultiStepCheck("auth-token-roundtrip", {
    name: "Auth API – token works against /me",
    tags: ["auth"],
    code: { entrypoint: path.join(__dirname, "token-roundtrip.spec.ts") },
  });
  // token-roundtrip.spec.ts
  import { test, expect } from "@playwright/test";
  test("login then /me", async ({ request }) => {
    const login = await request.post(`${process.env.ENVIRONMENT_URL}/auth/login`, {
      data: { email: process.env.SYNTHETIC_USER_EMAIL, password: process.env.SYNTHETIC_USER_PASSWORD },
    });
    expect(login.ok()).toBeTruthy();
    const { token } = await login.json();
    
    const me = await request.get(`${process.env.ENVIRONMENT_URL}/me`, {
      headers: { Authorization: `Bearer ${token}` },
    });
    expect(me.status()).toBe(200);
    const body = await me.json();
    expect(body.email).toBe(process.env.SYNTHETIC_USER_EMAIL);
  });
  
  5. TLS & cert expiry

  A pure config check — useful because cert rotation is a classic outage cause.

  new ApiCheck("auth-tls", {
    name: "Auth API – TLS cert valid > 14d",
    tags: ["auth"],
    request: {
      url: `${process.env.ENVIRONMENT_URL}/health`,
      method: "GET",
      assertions: [AssertionBuilder.statusCode().equals(200)],
    },
    // Checkly surfaces cert expiry on the run; you set a threshold per check
  });
  
  6. Browser check — full login UX

    expect(login.ok()).toBeTruthy();
    const { token } = await login.json();

    const { token } = await login.json();

    const me = await request.get(`${process.env.ENVIRONMENT_URL}/me`, {
      headers: { Authorization: `Bearer ${token}` },
    });
    expect(me.status()).toBe(200);
    const body = await me.json();
    expect(body.email).toBe(process.env.SYNTHETIC_USER_EMAIL);
  });

  5. TLS & cert expiry

  A pure config check — useful because cert rotation is a classic outage cause.

  new ApiCheck("auth-tls", {
    name: "Auth API – TLS cert valid > 14d",
    tags: ["auth"],
    request: {
      url: `${process.env.ENVIRONMENT_URL}/health`,
      method: "GET",
      assertions: [AssertionBuilder.statusCode().equals(200)],
    },
    // Checkly surfaces cert expiry on the run; you set a threshold per check

  5. TLS & cert expiry

  A pure config check — useful because cert rotation is a classic outage cause.

  new ApiCheck("auth-tls", {
    name: "Auth API – TLS cert valid > 14d",
    tags: ["auth"],
    request: {
      url: `${process.env.ENVIRONMENT_URL}/health`,
      method: "GET",
      assertions: [AssertionBuilder.statusCode().equals(200)],
    },
    // Checkly surfaces cert expiry on the run; you set a threshold per check
  });

  new ApiCheck("auth-tls", {
    name: "Auth API – TLS cert valid > 14d",
    tags: ["auth"],
    request: {
      url: `${process.env.ENVIRONMENT_URL}/health`,
      method: "GET",
      assertions: [AssertionBuilder.statusCode().equals(200)],
    },
    // Checkly surfaces cert expiry on the run; you set a threshold per check
  });

  6. Browser check — full login UX

  Runs against the front-end but exercises the auth API end-to-end including redirects, cookies, CSRF.

  new BrowserCheck("auth-ui-login", {
    name: "Login UI works",
    tags: ["auth"],
    code: { entrypoint: path.join(__dirname, "login.spec.ts") },
  });
  import { test, expect } from "@playwright/test";
  test("user can sign in", async ({ page }) => {
    await page.goto(process.env.ENVIRONMENT_URL!);
    await page.getByLabel("Email").fill(process.env.SYNTHETIC_USER_EMAIL!);
    await page.getByLabel("Password").fill(process.env.SYNTHETIC_USER_PASSWORD!);
    await page.getByRole("button", { name: "Sign in" }).click();
    await expect(page.getByText("Dashboard")).toBeVisible({ timeout: 10_000 });
  });

  7. OAuth callback reachability

  Doesn't fully exercise the Google/Microsoft flow (those need real consent), but checks the callback
  endpoint responds correctly to a missing-code request — confirms route + handler are wired.

  new ApiCheck("auth-oauth-google-callback-shape", {
    name: "Auth API – Google OAuth callback exists",
    tags: ["auth"],
    request: {
      url: `${process.env.ENVIRONMENT_URL}/auth/google/callback`,
      method: "GET",
      assertions: [
        // 400 for missing `code`, not 404/500 — proves handler is mounted
        AssertionBuilder.statusCode().equals(400),
      ],
    },
  });