Elastic Agents do not strictly require Fleet and Fleet Server. You can deploy them in standalone mode, which allows you to manually configure and manage them using local configuration files.
However, running Elastic Agents in Fleet-managed mode with a Fleet Server is the recommended best practice for most enterprise environments.
Here is how the two approaches compare:
1. Fleet-Managed Mode (Recommended)
In this setup, you use the Fleet UI in Kibana to centrally manage agent policies, roll out upgrades, and apply integrations.
- How it works: Agents connect to a Fleet Server (which is just a specialized Elastic Agent process), which then receives policies from Elasticsearch and pushes them to your endpoints.
- Best for: Large-scale deployments, continuous monitoring, and Elastic Security/Endpoint integrations.
- Action: To set this up, refer to the official Elastic Agent Installation Guide.
2. Standalone Mode
In this setup, you manually install the agent and define its inputs, outputs, and integrations directly in a local YAML configuration file.
- How it works: The agent connects directly to outputs like Elasticsearch or Logstash without a Fleet Server intermediary.
- Limitations: Central management, automated upgrades, and certain advanced Endpoint Security features are disabled.
- Best for: Edge cases, highly air-gapped networks, or evaluating specific integrations.
- Action: To configure this, use the steps outlined in the Standalone Elastic Agent Tutorial.
Example of how an Elastic Fleet can actually be wired
Components & where they live
Everything is in cluster company-prod-elastic-eks (EKS, us-east-2, now on v1.36.2), namespace elastic-system, across 5 nodes in 3 AZs:
Node AZ Node Group Runs
==== == ========== ====
ip-10-99-44-1 2a es MNG (m7g.2xlarge) agent
ip-10-99-55-34 2b default MNG (m5.large) agent + Fleet Server pod
ip-10-99-55-89 2b es MNG (m7g.2xlarge) agent
ip-10-99-66-51 2c es MNG (m7g.2xlarge) agent
ip-10-99-66-78 2c default MNG (m5.large) agent
- Fleet Server — Deployment fleet-server-prod, replicas: 1 (single pod, currently on the 2b node). Listens on :8220 HTTPS. Exposed by a NodePort service fleet-server-prod-agent-http (8220 --> 32202 on every node).
- Elastic Agents — DaemonSet agent-prod-eck-agent, one pod per node (5 total), spread across all 3 AZs. hostNetwork: true, mode: fleet, FLEET_INSECURE=true. They collect node/pod logs + metrics (hostPath mounts of /var/log/...).
- ALB k8s-elastics-eckfleet-a1b2c3d4e5 — internal (not internet-facing), spanning subnets in 2a / 2b / 2c. Created by the AWS LB controller from Ingress eck-fleet-server-prod-ingress. Targets = all 5 nodes' NodePort 32202 (instance mode), backend-protocol: HTTPS, idle_timeout: 300s.
- DNS — eck-fleet-server.internal-domain.local (Route53 via external-dns) --> the internal ALB. Elasticsearch is a separate endpoint, elasticsearch.internal-domain.local:443.
Who listens, who initiates
The key thing: every connection is initiated by the agent (outbound). Nothing is ever pushed to the agents — that's why this works with agents behind hostNetwork and no inbound rules of their own.
┌───────────────────────────────────────────────┐
│ cluster company-prod-elastic-eks / elastic-sys│
┌──────────┐ │ │
│ Agent │ check-in │ internal ALB Fleet Server (1 pod) │
│ DaemonSet│─────────►│ eckfleet-a1b2... ┌──────► :8220 (HTTPS) │
│ (5 pods, │ :443 │ :443 TLS ─────────┘ NodePort 32202→8220 │
│ 3 AZs) │ HTTPS │ (internal cert) kube-proxy → the 1 pod │
└────┬─────┘ │ ▲ re-encrypt HTTPS to a node's :32202 │
│ │ │ │ │
│ ship data └───────┼─────────────────────┼─────────────────┘
│ (logs/metrics) │ DNS │ writes .fleet-* / agent
▼ eck-fleet-server. ▼ metadata, reads policy
elasticsearch. internal-domain.local Elasticsearch + Kibana (Fleet app)
internal-domain.local:443 ◄──────────────────────
(NOT via the fleet ALB)
Traffic Flows: Breakdown
1. Agent --> Fleet Server (Control Plane: Enrollment & Policy Check-in)
The agent acts as the client, and Fleet Server listens on :8220. The agent dials https://eck-fleet-server.internal-domain.local:443:
Traffic Path:
agent --> DNS --> internal ALB :443 (TLS termination, internal cert) --> re-encrypt --> a node's NodePort :32202 --> kube-proxy --> the single Fleet Server pod :8220
Cross-AZ Routing:
Because the agent always targets the ALB DNS name (not the pod directly), an agent can land on any node's NodePort. It is then forwarded by kube-proxy to whichever node is hosting the single Fleet Server pod—making cross-AZ hops standard behavior.
The Long Poll Mechanism: The check-in is a long poll. The agent opens a connection, and Fleet Server holds it open for up to ~5 minutes until a policy change occurs or the poll times out.
The Latency "False Positive": This held-open duration is exactly what the ALB records as a TargetResponseTime of ~300s, which triggered the false-positive alert. The ALB's idle_timeout is intentionally set to 300s to support these long polls. Conversely, the health check (/health, expects 404, 300s interval) is a separate lightweight probe, explaining why targets report as perfectly healthy while "latency" metrics look highly skewed.
2. Agent --> Elasticsearch (Data Plane: Telemetry Shipping)
This data path does not traverse the Fleet ALB. The Fleet output configuration points agents directly at https://elasticsearch.internal-domain.local:443.
Consequently, the eckfleet ALB only handles control-plane check-in traffic. This explains why its request volume is incredibly low (~3–5 requests/min) while individual request durations hover around 300s. They are two entirely separate paths utilizing two different load balancers.
3. Fleet Server --> Elasticsearch + Kibana
Fleet Server (via elasticsearchRefs / kibanaRef) acts as the client here. It writes agent and policy metadata directly into the .fleet-* indices and coordinates with the Kibana Fleet application, where the eck-fleet-server and eck-agent policies are managed.
One-Line Summary
Agents (DaemonSet, 5 pods across 2a/2b/2c) long-poll the single Fleet Server pod for policy updates via the internal ALB on :443 --> NodePort 32202 --> :8220, but ship their actual telemetry data directly to Elasticsearch, completely bypassing that ALB. The ALB only processes the slow, held-open control traffic—which is the root cause of the alarm.
Why does fleet server holds connection open for so long? Why it does not reply immediately?
It's a deliberate design choice called long polling, and it's the opposite of a bug. Here's the reasoning.
The problem Fleet is solving
Fleet Server's job is to push policy changes out to agents promptly — a new integration, a changed log path, an output credential rotation. Agents need to find out "has my policy changed?" with low latency.
There are three ways an agent could learn about changes:
1. Short polling — agent asks "anything new?" every N seconds, server replies instantly "no", agent sleeps, repeats.
2. Server push — server opens a connection to each agent when something changes.
3. Long polling — agent asks "anything new?", and the server holds the request open until either something actually changes or a timeout fires.
Fleet uses #3, and the connection sitting open for ~5 minutes is that hold.
Why not reply immediately (short polling)?
If Fleet answered every check-in instantly with "nothing changed", then to get fast reaction to a policy change, agents would have to poll very frequently — say every few seconds. With your 5 agents that's tolerable, but Fleet is built to manage thousands to tens of thousands of agents from one server. At that scale:
- Frequent short polls = a constant storm of requests, almost all of which return "no change". Huge wasted CPU/network on both ends.
- To keep reaction time low you'd poll more often, which makes the storm worse. To cut the storm you'd poll less often, which makes policy changes take longer to land. There's no good setting.
Long polling breaks that trade-off: the agent gets a near-instant reaction to a real change (the server responds the moment policy updates) and there's almost no idle chatter (one held-open connection per agent instead of hundreds of empty round-trips).
Why not server push (#2)?
Pushing would mean the server initiating connections inward to every agent. Agents are all over the place — behind NAT, firewalls, in private subnets, on laptops, on hostNetwork pods like yours. The server usually can't reach them, and you'd need inbound rules everywhere. Long polling flips it: the agent always dials out to the server, the connection is already established and held open, and the server pushes the change down that existing agent-initiated connection the instant it happens. You get push-like latency with poll-like (outbound-only) connectivity. That's exactly why your agents work behind hostNetwork with no inbound exposure.
So what's actually happening in those ~300s
A check-in is essentially: "Here's my current state; tell me the moment my policy differs." The server parks that request. Two things can end it:
- A policy change occurs → server responds immediately with the new policy (could be 2 seconds in).
- Nothing changes → the server lets the request time out at its poll ceiling (~5 min), responds "no change", and the agent immediately opens a fresh one.
Since your policies rarely change, almost every check-in runs the full clock and returns at ~300s. That's the held connection the ALB measures as TargetResponseTime.
Why this specifically fools the ALB
From the ALB's point of view, "request received → response sent" took 300 seconds, so it reports TargetResponseTime ≈ 300s. The ALB can't tell the difference between "the backend was slow for 300s" (bad) and "the backend intentionally held an idle long-poll for 300s" (normal). That ambiguity is the whole reason your generic 0.8s threshold misfires — and why the right fix (DOP-833) is to exempt this endpoint rather than treat it as latency. It's also why the Ingress sets idle_timeout.timeout_seconds=300: the ALB has to be told to tolerate the held connection, otherwise it would sever the long poll before the agent's poll cycle completes.
In short: Fleet holds the connection open so it can deliver policy changes near-instantly without either hammering the server with empty polls or needing to reach inward to firewalled agents. The ~300s is just an idle long-poll waiting for a change that usually never comes during that window — efficient by design, and only a problem for a latency metric that doesn't know to expect it.
----

No comments:
Post a Comment