You should start with one AI agent and add a team only when a single agent, even with more tools, can no longer reliably finish the job. That is the answer the two loudest camps in this debate actually agree on once you read past the headlines. A team of coordinated agents can beat a single agent by a wide margin on the right kind of work (Anthropic measured a 90.2% improvement on its internal research evaluation), but it also burns roughly 15x the tokens of a normal chat and brings real coordination and failure-mode overhead. So the decision is not "agents are cool, let us use more of them." It is "what does this specific job need, and what will the extra agents cost me?"
This guide gives you a plain rule for that decision, the one number that tells you when a team pays off, and the single trick that resolves the famous disagreement between the "multi-agent wins" crowd and the "don't build multi-agents" crowd. If you would rather we make this call and run it for you, see how we run generative AI architecture. Everything below is still yours to use on your own.
What is a multi-agent system, in plain terms?
A single AI agent is one large language model in a loop: it plans, calls tools, reads the result, and repeats until the job is done. A multi-agent system is several of those agents working on the same goal, usually with one lead agent that breaks the work into pieces and hands each piece to a specialized worker agent, then combines what they send back.
Anthropic describes the most common shape as orchestrator-worker. A lead agent plans and decomposes the query, spawns specialized subagents that search in parallel, and then synthesizes their findings into one answer. OpenAI describes two other common shapes: a manager pattern, where one central agent calls specialist agents as tools and keeps reasoning until it is done, and a handoff pattern, where peer agents pass full control of the task to whichever specialist fits next.
The key thing for a business owner is what this buys you and what it costs. More agents let you explore many directions at once and hold more total information than a single context window can fit. In exchange you add coordination, latency, and cost. The whole art is knowing when that trade is worth it.
When is one agent the right answer?
Most of the time. This is the part the hype skips. OpenAI's official build guide is explicit: start with one agent plus tools, and a single agent can handle a surprising range of work (customer service, multi-step analysis, content generation, code review) just by adding tools incrementally. Microsoft's Azure architecture guidance lands in the same place, calling a single agent with tools the right default for enterprise use cases, and advising teams to use the lowest level of complexity that reliably meets the requirement.
The logic is a spectrum of increasing cost:
- A direct model call (just ask the model).
- A single agent with tools (the model can now act on the world).
- A multi-agent system (several coordinating agents).
Each step up adds coordination overhead, latency, and cost. So you only climb a step when the step below stops being reliable. For a large share of real business jobs, a single well-built agent with the right tools is the ceiling you never actually hit.
A useful gut check: one capable agent with good tools and good memory is usually enough for a workflow that one focused person could do if they had all day. You only need a team when the job is genuinely too big or too varied for that.
When do you actually need a team of agents?
You move to a multi-agent system when your evaluations (your tests) show a single agent struggling, not because a team sounds more powerful. There are three recurring triggers everyone converges on:
- The prompt and logic get too complex for one agent. The instructions become so long and branchy that the model loses the thread.
- Tool overload. The agent has so many tools that it picks the wrong one or gets confused about which to use.
- Distinct domains or security boundaries. The work spans areas that genuinely need separate context or separate access (for example, a finance step and a customer-data step that should not see each other's information).
One important caveat on tool overload: tool count alone is not the threshold. OpenAI notes that some agents get confused with fewer than 10 tools, usually because of poor instructions or bad tool descriptions, not because 9 tools is too many. The real trigger is reliability under evaluation, not a magic number. Fix the tool descriptions first. If the single agent still cannot do it reliably, then you split.
There is also a positive case where a team clearly earns its place: breadth-first work. Anthropic found multi-agent systems shine on open-ended research that needs dynamic exploration across many independent directions and an aggregate of information larger than one context window can hold. If your job is "go find and weigh fifty different things at once," that is what a team is for.
How much more does a team of agents cost?
This is the number almost no buyer-facing article puts in front of you, and it should change how you decide. From Anthropic's production data:
- A single agent already uses about 4x the tokens of a normal chat interaction.
- A multi-agent system uses about 15x the tokens of a chat.
Tokens are what you pay for. So a coordinated team can be roughly 15 times more expensive per task than a simple chat, and several times more expensive than a single agent doing the same work. That is not a reason to avoid multi-agent systems. It is the reason to be honest about when they pay off.
The math is simple. A team of agents only makes sense when the extra cost buys a result you could not get otherwise, and that result is worth the spend. Anthropic's own framing is to scale effort to query complexity: cheap jobs get a cheap setup, and you reserve the expensive multi-agent machinery for high-value tasks where parallel breadth genuinely improves the answer. A 15x bill on a low-stakes task is just waste. The same 15x on a high-value research or analysis job that a single agent simply could not complete is a bargain.
How big is the upside when a team is the right call?
Real, when the task fits. On Anthropic's internal research evaluation, a multi-agent system (a Claude Opus 4 lead with Claude Sonnet 4 subagents) outperformed a single Claude Opus 4 agent by 90.2%. On a separate browsing benchmark, the amount of token usage alone explained 80% of the performance variance, and three factors together explained 95%. In plain terms: on this kind of broad, exploratory work, throwing more coordinated effort at the problem really does produce better answers.
Speed improves too. Parallelization, where the lead spawns several subagents at once and each subagent uses multiple tools at once, cut research time by up to 90% on complex queries. Many hands genuinely finish faster here, because the directions are independent.
Hold both facts side by side. The same setup that delivers a 90.2% quality gain is also the one that costs about 15x a chat. That is the trade in a nutshell: a team buys you breadth, depth, and speed on big parallelizable jobs, and you pay for it in tokens and coordination. The skill is matching the machinery to the task.
The debate everyone is arguing about, resolved
Here is the tension you have probably bumped into. Anthropic and Microsoft document multi-agent systems as a powerful pattern with real wins. Cognition, the team behind the Devin coding agent, published a piece bluntly titled "Don't Build Multi-Agents," arguing that single-threaded agents are more reliable. Read one and you will buy a team. Read the other and you will swear them off. Both are right, and the reconciliation is the most useful thing in this article.
Cognition's argument is concrete. Naive multi-agent setups are fragile because decision-making gets too dispersed and context is not shared thoroughly enough between agents. Their principle: actions carry implicit decisions, and conflicting decisions carry bad results. Their example is a game build where one subagent designs a Super Mario style background while another builds a bird that does not match, because neither subagent can see the design choice the other quietly made. The pieces do not fit because no one held the whole picture.
LangChain supplies the line that resolves it: read actions are inherently more parallelizable than write actions. Splitting work across agents is great for gathering and reading, where the directions are independent. It is dangerous for writing, where parallel agents make conflicting implicit decisions that then collide. And here is the tell that proves the synthesis: even Anthropic's celebrated research system, the one that beat a single agent by 90.2%, reads with many parallel subagents but synthesizes the final report with a single main agent in one unified call. They parallelize the reading and single-thread the writing.
So the camps are not really in conflict. They are describing different halves of the same rule:
- Parallelize reading and exploration across many agents. This is where the multi-agent upside lives.
- Keep tightly coupled work, shared context, and final writing on one agent. This is what Cognition is protecting.
Multi-agent is the wrong choice when all agents must share the same context, or when there are many dependencies between agents. That is exactly why most coding is better single-threaded: every change depends on every other change, so the context cannot be cleanly split.
The decision in one table
| Your situation | Use one agent | Use a team of agents |
|---|---|---|
| A single agent finishes reliably under your tests | Yes | No, you are adding cost for nothing |
| The prompt and logic are too complex for one agent | No | Yes, after you simplify and still fail |
| Too many tools, even after fixing descriptions | No | Yes, split by domain |
| Distinct security or data boundaries | No | Yes, isolate context |
| Broad, parallelizable research beyond one context window | Possible but slow | Yes, this is the sweet spot |
| Tightly coupled work where every step depends on the last (most coding) | Yes | No, dispersed decisions break it |
| The task is the writing or synthesis step | Yes, single-threaded | No, parallel writing conflicts |
| Low-value, routine job | Yes | No, 15x tokens is not worth it |
How a team of agents maps to real business jobs
The orchestration patterns are documented abstractly in the engineering literature. Here is what they look like as everyday work, so you can recognize your own:
- A single triage agent (handoff / routing). One agent reads every incoming request and routes it, or handles it directly, passing control to a specialist only when needed. One active agent at a time. Good for support intake. Watch for routing that loops or sends things to the wrong place.
- A research team (orchestrator-worker / concurrent). A lead agent fans the question out to several reading agents that explore in parallel, then pulls their findings together. This is the breadth-first sweet spot, and it is also where the 15x cost shows up. Worth it for high-value market, competitor, or due-diligence research.
- A maker-checker review loop (group chat). One agent produces work, another checks it, and they iterate. Good for quality control on drafts and analysis. Microsoft recommends keeping this kind of round-table to three or fewer agents to stay in control and avoid endless loops.
- A draft-review-polish pipeline (sequential). Agents hand off in a fixed order, each refining the last one's output. Simple and predictable. The catch is that an early mistake propagates down the line, and there is no parallelism, so it can be slow.
Notice the pattern across all four: the reading and checking fan out, but the final writing and the tightly coupled decisions stay on a single thread. That is the same rule again, in business clothes.
The mistakes that sink multi-agent projects
If you do go to a team, these are the failure modes the practitioners keep flagging:
- Splitting too early. Reaching for a team before you have maxed out a single agent. You pay 15x for a problem better instructions would have solved.
- Vague subagent instructions. Anthropic warns that short, vague instructions cause subagents to duplicate each other's work or misinterpret the task. Each subagent needs a clear objective, an output format, tool guidance, and firm task boundaries.
- Parallelizing the writing. The Cognition trap: letting multiple agents make conflicting implicit decisions on coupled work.
- Ignoring distributed-systems reality. Microsoft is blunt that multi-agent systems inherit classic distributed-systems failure modes: node failures, lost messages, cascading errors. Production setups need timeouts and retries, output validation between agents, circuit breakers, and durable external state. This is real engineering, not a prompt you write once.
So which should your business use?
Start with one agent and good tools. It is cheaper, simpler, more predictable, and it is the right default for most enterprise work. Add a team only when your own tests show a single agent failing reliably, and even then, follow the rule the whole field quietly agrees on: parallelize the reading, single-thread the writing, and keep shared-context work together. Reserve the 15x machinery for high-value, broad, parallelizable jobs where the breadth genuinely changes the answer.
That is the decision. Owning it in production, the cost economics, the subagent instructions, the validation between agents, and the failure handling, is a different job. It is the one we do: we make the single-versus-team call with you, build it on the simplest pattern that works, and run it inside your business with the guardrails that keep it reliable. If you want that call made well the first time, book a free consultation below and we will look at your highest-value workflow and tell you, honestly, whether it needs one agent or a team.
