Coordinated AI agents work together using one of five repeatable patterns, and each one is just a different answer to a single question: who does what, in what order, and who checks the result. An orchestrator-worker setup has a lead agent break a big job into pieces and hand them to parallel workers. A sequential pipeline passes work down an assembly line, one stage at a time. A concurrent fan-out runs the same task many ways at once and merges the answers. A handoff or triage router reads each incoming request and sends it to the right specialist. A maker-checker loop has one agent do the work and a second agent review it before it ships. That is the whole toolkit. Everything fancier is a combination of these five.
This article translates those five patterns out of the engineering docs and into the everyday business jobs they actually map to, and it ties each one to its real tradeoff in cost, speed, and reliability. By the end you will know which pattern fits a research task versus a content pipeline versus a support inbox, and why the choice matters to your token bill. If you would rather we make and run this call for you, see how we run generative AI architecture. Everything below is still yours to use on your own.
What does "coordinated agents" even mean?
A single AI agent is a model in a loop with some tools and memory. It plans, acts, observes the result, and repeats until the job is done. That is enough for a surprising amount of work. Both OpenAI and Microsoft are explicit that you should max out a single agent with tools first, because a single agent is often the right default for enterprise use cases.
You only reach for multiple agents when one agent, even with more tools, can no longer reliably finish the task. That happens when the prompt gets too complex for one set of instructions, when the agent has too many tools to choose between reliably, or when the work spans domains that need separate context or security boundaries. When that line is crossed, you stop asking "how do I make one agent smarter" and start asking "how do I split this across several agents and coordinate them." The five patterns below are the established answers to that coordination question. (For the deeper decision of whether you need a team at all, see our companion piece on one agent versus a multi-agent system.)
One number to keep in the back of your mind the whole way through: Anthropic reports that a single agent already uses about 4x the tokens of a normal chat, and a multi-agent system uses about 15x the tokens of a chat. Coordination is not free. The right pattern is the cheapest one that does the job reliably.
Pattern 1: Orchestrator-worker (the research team)
This is the pattern Anthropic used to build its production research system, and it is the one most people mean when they say "multi-agent." A lead agent reads the goal, plans, and decomposes it into independent subtasks, then spawns specialized worker agents that go off and handle those subtasks in parallel and report back. The lead synthesizes everything into one answer.
The everyday job this maps to is a research team. You ask one question that has many independent threads ("what are the five strongest competitors in this market and how do they price"), the lead splits it into one thread per competitor, five workers research in parallel, and the lead writes the brief. The results are striking when the work genuinely fits: Anthropic's multi-agent setup (a Claude Opus 4 lead with Claude Sonnet 4 subagents) outperformed a single Claude Opus 4 agent by 90.2% on their internal research evaluation, and parallelization cut research time by up to 90% on complex queries.
The catch is that the lead has to write good instructions. Anthropic found that vague, short task descriptions cause subagents to duplicate each other's work or misinterpret the goal. Each worker needs a clear objective, an output format, guidance on which tools and sources to use, and firm task boundaries. That instruction-writing is engineering, and it is exactly the kind of thing we own when we run this for a client.
Tradeoff: highest breadth and the biggest speed gains on parallelizable work, but also the highest token cost. Use it when the information genuinely exceeds one context window and the threads are independent.
Pattern 2: Sequential (the assembly line)
The sequential pattern, also called a pipeline or prompt chaining, is the simplest multi-agent shape. Agents run in a fixed, linear order, each one taking the previous agent's output as its input. Draft, then review, then polish. There is no parallelism and no debate, just a deterministic hand-off down the line.
The business job is any assembly line where order matters. Content production is the classic: a research agent gathers facts, a drafting agent writes, an editing agent tightens, a compliance agent checks the claims. Invoice processing is another: extract the line items, validate against the purchase order, then post to the ledger. Because the order is fixed, the system is predictable and easy to reason about, which is why it is often the right first step up from a single agent.
The weakness is that a failure early in the line propagates downstream. If the research agent gets a fact wrong, every stage after it inherits the error, and there is no parallel path to catch it. That is why sequential pipelines usually want a validation step built in, which is where the maker-checker pattern below earns its place.
Tradeoff: the most predictable and easiest to debug, with no coordination chaos, but the slowest because nothing runs in parallel, and fragile to early-stage mistakes that cascade.
Pattern 3: Concurrent (the second opinion)
The concurrent pattern, also known as fan-out/fan-in or map-reduce, sends the same input to several agents at once and then aggregates their results. Where orchestrator-worker splits a job into different pieces, concurrent runs the same piece through different agents in parallel to get multiple independent takes.
The job this maps to is anything where you want a second (or third) opinion, fast. Think of three agents independently scoring the same inbound sales lead, or several agents drafting different versions of the same ad so you can pick the best, or multiple analysts each evaluating a contract for a different category of risk and the results being merged into one report. It shines when the task is latency-sensitive and you can afford to spend tokens to buy a better or faster answer.
The cost is real on two fronts. It is resource-intensive by design, since you are running the work several times over, and it needs a conflict-resolution step: when three agents disagree, something has to decide which answer wins or how to merge them. That aggregation logic is part of the build, not an afterthought.
Tradeoff: buys multiple perspectives and low latency by spending tokens in parallel, but it is the most resource-hungry per task and requires designed conflict resolution at the merge step.
Pattern 4: Handoff / triage (the front desk)
The handoff pattern, also called routing or triage, keeps exactly one agent active at a time. An incoming request hits a triage agent that figures out what kind of request it is and transfers full control, including the conversation state, to the right specialist. OpenAI calls this the decentralized pattern, where peer agents hand off execution rather than a manager calling them as tools.
This is the front desk of your business. A customer message arrives, the triage agent reads it, and routes it: billing questions to the billing agent, technical issues to the support agent, refund requests to the agent that knows your refund policy. The right specialist emerges during processing rather than being decided up front. It is the natural shape for support inboxes, help desks, and any high-volume intake where requests are varied but each one belongs to a clear category.
The danger is unpredictable routing. Because agents decide when to hand off, a poorly designed system can bounce a request between agents in an infinite handoff loop, or send it to the wrong specialist. Tight routing rules and guardrails are what keep it sane.
Tradeoff: efficient because only one agent runs at a time (so the token cost is closer to a single agent than a full team), and it gets each request to the right expert, but routing can be unpredictable and needs guardrails against handoff loops.
Pattern 5: Maker-checker and magentic (the reviewer and the open-ended planner)
The last group covers the patterns where agents share a thread and a manager decides who speaks next. Microsoft groups these as group-chat orchestration, and two flavors matter most for business.
The first is maker-checker, the one we recommend most often for anything that touches customers or money. One agent (the maker) produces the work, and a second agent (the checker) reviews it against your rules before it is accepted. The maker drafts the customer email, the checker confirms it follows your tone and policy. The maker generates the invoice, the checker validates the totals. It is the cheapest reliability upgrade you can bolt onto almost any pipeline, and it pairs naturally with the sequential pattern.
The second is magentic, sometimes called the dynamic or task-ledger pattern. A manager agent builds and continually refines a running task ledger for open-ended problems that have no predetermined plan. It is the right tool when the goal is fuzzy and the steps cannot be known in advance, but it is slow to converge and can stall on ambiguous goals, so it is the heaviest pattern to operate.
One hard limit applies to all the group-chat shapes: Microsoft recommends keeping free-flowing group-chat or debate setups to three or fewer agents, because past that they get hard to control and prone to infinite loops.
Tradeoff: maker-checker buys accuracy for a modest token and latency cost and is worth it on high-stakes output. Magentic buys flexibility on truly open-ended work but is the slowest and hardest to keep on track.
The patterns side by side
| Pattern | Business job | Buys you | Costs you |
|---|---|---|---|
| Orchestrator-worker | A research team on a multi-thread question | Breadth and big speed gains (up to 90% faster) | Highest token cost; needs sharp worker instructions |
| Sequential | An assembly line (content, invoicing) | Predictability; easy to debug | Slowest; early errors cascade downstream |
| Concurrent | A second opinion (lead scoring, drafts) | Multiple perspectives, low latency | Resource-heavy; needs conflict resolution |
| Handoff / triage | A front desk (support, help desk) | Right specialist per request; near single-agent cost | Routing can be unpredictable; loop risk |
| Maker-checker | A reviewer on customer or money output | Accuracy and a safety net | Extra tokens and a little latency |
The one rule that ties it together
If you take a single principle from all of this, take the read-versus-write rule. As LangChain puts it, read actions are inherently more parallelizable than write actions. Reading, searching, and exploring spread cleanly across many agents at once. Writing in parallel does not, because, as Cognition warns, actions carry implicit decisions, and conflicting decisions carry bad results. Their example sticks: two subagents building one game, where one makes a Super Mario background and the other a non-matching bird, because neither sees the other's implicit choices.
This is why even Anthropic's research system, which reads with many subagents in parallel, synthesizes the final report with a single agent in one unified call. So the practical default is: fan out for the reading and the exploration, then converge to one agent for the writing and any tightly coupled decisions. Most of the five patterns above are just structured ways of applying that rule to a specific job.
What makes a multi-agent system fail in production?
The patterns are the easy part. The hard part is that the moment you have more than one agent, you have a distributed system, and you inherit every classic distributed-systems failure: node failures, lost messages, and cascading errors where one agent's mistake snowballs through the rest. Microsoft's production reliability guidance is blunt about what a real deployment needs: timeouts and retries, output validation between agents, circuit breakers, compute isolation, durable external state, and context compaction so agents do not run out of memory mid-task.
None of that shows up in a demo. It shows up three weeks into production when a worker agent times out and the lead has no fallback. This is the gap between a multi-agent system that looks impressive in a notebook and one that runs reliably inside your business every day. It is engineering, monitoring, and ownership, not a prompt.
So which pattern do you need?
You usually do not need to choose, because real systems combine them. A support operation might use handoff at the front door, a maker-checker loop on every outbound reply, and an orchestrator-worker burst when a ticket needs research across several systems. The skill is matching each job to the cheapest pattern that does it reliably, then wiring the reliability engineering underneath so it survives contact with real traffic.
That matching and operating is the work. Picking the pattern, writing the worker instructions, designing the conflict resolution and the validation steps, owning the roughly 15x token economics, and keeping the whole distributed system healthy is a job, not a setting. It is the one we do: we plan, build, and run the agents inside your company, single or coordinated, and we own the pattern choice, the cost, and the reliability so you get an operating system instead of a science project.
If you want the right pattern picked and run for your specific work, book a free consultation below and we will map your first one together.
