Wednesday, 3 June 2026

Some notes on LLMs

 

What does e.g. 1M context mean in a model description?


"1M context" means the model can handle up to 1 million tokens in its context window — the combined input (your messages, files, tool results, system prompt) plus output it can consider at once. For reference, that's roughly 750,000 words or a few thousand pages of text, letting it work with very large codebases or long conversations without losing earlier content.

What is a LLM's tool-use loop? 


An LLM tool-use loop is the pattern where you let a language model drive an investigation by repeatedly choosing tools to call, rather than answering in one shot.

The shape


  1. Send: system prompt + user request + list of available tools (with JSON schemas)
  2. Model responds with either:
       (a) a final text answer  -> exit loop
       (b) a "tool_use" block: { name: "run_aws_cli", input: { args: [...] } }
  3. Your code executes that tool, captures the result
  4. Append the tool result to the conversation as a "tool_result" message
  5. Send the whole conversation back to the model
  6. Goto 2

The model never executes anything itself — it just emits requests to call tools. Your code is the runtime that actually runs them and feeds the output back.

Why it's a loop

Each turn the model sees everything it has learned so far (prior tool calls + their outputs) and decides the next step based on that. So a real run looks like:

  - Turn 1: model calls cloudwatch describe-alarms --state-value ALARM
  - Turn 2: sees 3 alarms, picks the noisiest, calls logs filter-log-events for that log group around the alarm time
  - Turn 3: sees an error pattern, calls kubectl describe pod on the affected workload
  - Turn 4: emits final Markdown report, no tool call → loop exits

  The model is doing the planning; your code is the dispatcher.

Why you need a budget

Without limits the loop can spin forever — the model keeps finding "one more thing to check." Hence in agent.run():

  - max_iterations=30 — hard cap on turns
  - max_tokens_per_turn=12288 — cap on a single response
  - Per-tool wall-clock timeouts (60 s for CLI, 30 s for HTTP)
  - Output truncation (50 000 char stdout) so a giant tool result doesn't blow the context window

How it ends

The loop terminates when the model returns a response with no tool_use block — that's the "I'm done, here's the answer" signal (stop_reason: end_turn). Or when you hit a budget limit and force-stop it.

Where the safety lives

Because the model can ask for arbitrary tool calls, the loop is only as safe as the tool implementations. That's why when implementing agents we should have the allowlists (services, verbs, paths) - the model can request aws s3 rm, but the validator rejects it before subprocess.run ever sees it.

The "two-pass" design in agent is a refinement: pass 1 is a tool-use loop (gather), pass 2 is a single non-loop call (synthesize). Splitting them lets each prompt focus on one job.


What are those .md files used by AI Agents?


There isn't a universally agreed official name, but people commonly refer to files like CLAUDE.md, GEMINI.md, AGENTS.md, COPILOT_INSTRUCTIONS.md, and .cursorrules as:

  • AI agent instruction files (most generic)
  • Agent configuration files
  • Agent context files
  • LLM instruction files
  • Repository AI instructions
  • Project AI guidelines

In the developer tooling community, "agent instructions" or "agent context files" are probably the most widely understood umbrella terms.

For example:

Tool             File
----               -----
Claude Code       CLAUDE.md
Gemini CLI       GEMINI.md
GitHub Copilot   .github/copilot-instructions.md
OpenAI Codex CLI   AGENTS.md
Cursor             .cursorrules / project rules
Windsurf          Rules files

Collectively, you could describe them as:

"Repository-level AI agent instruction files that provide persistent context and operating rules for coding assistants."

If you're building tooling around them (e.g., in your DevOps work), I'd recommend using "agent instructions" as the generic term because it's vendor-neutral and easily understood across Claude, Gemini, Copilot, Cursor, Codex, and similar tools.



No comments: