All posts

Most AI Features Don't Need an Agent

EngineeringAI

The agent hype has a simple structure: something is powered by AI, therefore it should be agentic. A doc search feature? Make it an agent that retrieves, reflects, and verifies. A customer support responder? Build a multi-agent pipeline with a router, a responder, and a quality-check layer.

This is usually a mistake.

What an agent actually is

An agent is a system where an LLM decides what actions to take next, based on what it's already done. It uses tools, observes results, and iterates until a stopping condition is reached. The key word is decides — the path isn't fixed. The model is in control of the loop.

That's a powerful capability. It's also expensive, slow, and hard to test. When you need it, nothing else will do. When you don't need it, you've added complexity that serves no one.

What most AI features actually are

Most features described as "using AI" are simple pipelines:

  1. Take user input
  2. Retrieve relevant context (maybe)
  3. Format a prompt
  4. Call the model
  5. Post-process the output
  6. Return the result

This is deterministic. The steps don't change based on model output. There's no loop. There's no tool-calling. There's no agent.

If this describes your feature, don't build an agent. Build a pipeline.

A concrete example

We were asked to add a "summarize this thread" feature to a project management tool. Initial plan: the user clicks summarize, the app calls the model, displays a summary. A few hours of work.

Before shipping, someone suggested making it "more powerful" with an agentic loop: first, the agent retrieves related threads for context; then decides if it has enough information; then optionally fetches linked documents; then writes the summary.

We built the agentic version first, as a prototype. Results:

  • Median latency went from 1.2s to 7.8s
  • The model would occasionally decide it needed more context and loop 4–5 times, costing 10–15x more per request
  • In roughly 8% of runs, the agent got stuck in a retrieval loop and timed out
  • Testing became effectively impossible — the same input could produce different tool-calling sequences

We shipped the simple pipeline. Users don't notice any difference in output quality. They do notice the speed.

The actual decision criterion

Use an agent when the task genuinely requires autonomy:

  • The steps can't be known in advance because they depend on what was found in previous steps
  • The system needs to explore, verify, or recover from intermediate failures
  • The task is open-ended enough that a fixed pipeline would need to enumerate too many branches

A code debugging assistant is a good agent candidate. The model reads an error, decides what file to look at, reads it, decides whether to run a test, reads the output, and iterates until it has a diagnosis. You can't hardcode that path.

A "what is our refund policy?" customer support bot is not a good agent candidate. Retrieve the policy doc, inject it into the prompt, answer the question. Done.

The distinguishing question is whether the model needs to determine its own path or just execute a known one. Almost everything falls into the second category.

The failure modes agents introduce

Beyond latency and cost, agents have failure patterns that pipelines don't:

Loops and timeouts. An agent that hasn't found satisfying context will keep searching. You need loop limits, and those limits are arbitrary. Set them too low and the agent gives up too early; set them too high and you're paying for unproductive iterations.

Partial completion. A pipeline either runs to completion or fails cleanly. An agent can complete three of five steps, fail on the fourth, and leave you with a partial result that's harder to handle than a clean error.

Non-determinism as a bug. In a pipeline, the same input reliably produces the same sequence of operations. In an agent loop, the model's tool selection can vary run to run. That variation makes regression testing fragile.

These problems are solvable. Solving them adds weeks of engineering work and ongoing maintenance. Only take that on when the autonomy is genuinely necessary.

Why this matters when evaluating a studio

Any firm can spin up an agent loop in a weekend — the frameworks make it trivial. The architectural judgment is knowing when not to. A studio that defaults to agents for every AI feature is optimizing for impressive demos, not for software that works reliably at scale.

Ask the question directly: why does this feature need an agent rather than a pipeline? If the answer involves the word "powerful" without explaining what specific decisions the model needs to make autonomously, you're looking at unnecessary complexity being sold as sophistication.

The right AI architecture for most products in 2026 is a set of well-designed pipelines with clear inputs and outputs, deployed deterministically, tested like any other code. Agents for the handful of features that genuinely need autonomy. Not the reverse.