Chapter 11 · Agents · 10 min

From model that replies to model that acts

Tool use, ReAct loop, multi-step tasks. How an LLM becomes an agent capable of acting in the world.

From model that responds to model that acts

Until now, every chapter described a LLM in a passive role: it receives a prompt, it generates a response, done. The model does nothing in the world — it produces text.

But something has changed. LLMs can now call tools: search the web, execute code, read a file, send an email. And with this capability, a completely different architecture becomes possible.

The limit of the toolless model

Ask a LLM without internet access: "What is Apple's stock price right now?"

It'll make up something plausible. Or it'll say it doesn't know. In either case, it can't answer correctly, because the information isn't in its parameters.

Give it a tool — a real-time stock API — and the answer becomes trivial. The model doesn't need to learn this information during training. It gets it at the moment it needs it.

The ReAct loop

The reference architecture for agents is called ReAct (Reasoning + Acting). It works in three cyclic phases:

Think — The model analyzes the situation. It writes its reasoning chain: "To answer this question, I need X. I'll call tool Y with these parameters."

Act — The model generates a structured tool call: the tool name and its parameters. The system executes the call and retrieves the result.

Observe — The tool result is injected into the context. The model sees what happened and decides what to do next.

Then we repeat. Until the model determines it has enough information to respond.

Step through it

Here are three tasks: a simple one (calculation), one that requires an API, one that chains multiple tools. Step through it to see the model's reasoning and how the context accumulates each turn.

Watch the loop: the model thinks, picks a tool, reads the result, and starts again. Each cycle is a fresh token prediction — the "agent" emerges from a function-calling LLM, not from a new architecture.

How tools are defined

A tool isn't a piece of code the model "magically understands." It's a structured definition — name, description, parameters — that the model sees in its context:

{
  "name": "web_search",
  "description": "Searches for recent information on the web.",
  "parameters": {
    "query": "string — the search query"
  }
}

The model learned during training to produce tool calls in this format. When it "chooses to use a tool," it simply generates text that looks like an API call.

The system detects this text, executes the real call, and returns the result to the context.

The growing context

Each iteration of the loop adds tokens to the context: the reasoning, the tool call, the result. A complex task requiring five iterations can easily consume several thousand tokens.

That's why agents tend to be slower and more expensive than simple question-answer exchanges. And that's why context management — knowing what to keep, what to summarize, what to discard — is an open problem in the design of agentic systems.

Tool use vs fine-tuning

A natural question: why teach the model to use tools rather than directly teaching it the information?

Several reasons:

Data changes. A stock price, the weather, the state of a database — this information changes constantly. No training can capture it.

Precision. A calculation, a SQL query, a unit conversion — tools are deterministic and exact. LLMs are not.

Modularity. Giving a new tool to a model takes a few lines. Retraining a model to integrate a new skill takes weeks and millions of dollars.

Planning and task decomposition

The most capable agents don't settle for a linear loop. They decompose a complex task into subtasks, execute some in parallel, and combine the results.

For example, "write a comparative report on three competitors" can decompose into: look up each competitor's information (three parallel calls), then synthesize the results.

This is still an active area of research. Current LLMs do reasonable planning on short tasks, but drift easily on long, complex ones.

The risks of agents

With the ability to act comes the ability to cause harm.

Irreversible actions. An agent with access to your mailbox can send an email. No undoing possible. Good agentic architectures distinguish read actions (harmless) from write actions (to confirm).

Infinite loops. Without guardrails, an agent can get stuck in a loop: it searches for information, doesn't find it, rephrases, searches again… indefinitely.

Reward hacking. If the objective is poorly specified, an agent can find unexpected shortcuts to maximize its score — without doing what you actually wanted.

Hallucinations about tools. The model can invent calls to tools that don't exist, or with incorrect parameters.

MCP: toward a standard for tools

Early on, every vendor defined its own tool-use format: OpenAI had function calling, Anthropic had its internal protocol, every agent framework reinvented the wheel. Result: incompatibilities, integrations rebuilt for each model, fragmented ecosystem.

In November 2024, Anthropic published the Model Context Protocol (MCP): an open standard for describing tools, resources, and prompts in a model-independent way. An MCP server exposes a set of tools (e.g., "read this file", "query this database"). Any MCP-compatible client — Claude Desktop, Cursor, VSCode extensions, agent frameworks — can connect to it.

The recurring analogy: MCP is to LLMs what USB-C is to peripherals. A common port.

Adoption was fast: OpenAI, Microsoft, and most major vendors announced MCP support in 2025. It has become the de facto protocol for tool use.

Code interpreter, sandboxes, computer use

A few particularly important tool classes:

Code interpreter. A Python sandbox (sometimes JavaScript) where the model can run arbitrary code. Precise calculations, data manipulation, chart generation — anything LLMs do badly natively can be delegated to Python. Available in OpenAI, Claude, Google.

Browser / web automation. A tool that lets the model click, scroll, fill forms on web pages. Anthropic calls this computer use; OpenAI offers Operator. Still fragile, but evolving fast.

File system & shell. A tool that gives access to a virtual disk and a terminal. The core of "coding agents" like Cursor, Cline, Aider, Claude Code.

Long-term memory

An agent's context grows, but stays bounded. How does an assistant recognize you in the next conversation? With external long-term memory.

Several approaches:

  • Vector memory — each important interaction is summarized and stored as an embedding. At each new conversation, relevant memories are retrieved (RAG, memory edition).
  • Structured user profile — the agent maintains a dossier on the user (preferences, ongoing projects, history).
  • Procedural memory — the agent keeps track of recipes that worked ("to summarize a paper, follow these steps").

ChatGPT introduced memory in 2024, Claude in 2025. It's one of the most active areas of agent design.

Multi-agent: several LLMs collaborating

A recent trend: instead of a single agent, have multiple specialized LLMs collaborate.

An orchestrator agent receives the task, decomposes it and delegates to specialized agents (a code expert, a web search expert, a checker). Results come back to the orchestrator, which combines them.

This architecture resembles a human organization — and has the same advantages (parallelization, specialization) and the same problems (communication, coordination, information loss between agents).

What this changes

The LLM is no longer an oracle you query — it's a brain you connect to arms.

This transition is still recent. Current agents are impressive on well-defined tasks and fragile on long, ambiguous ones. But the pace is fast, and understanding the fundamental architecture — ReAct, tool use, growing context — is the best starting point.

Updated

LLM agents: from answering to acting · Step by Token