Moving accounts payable and reconciliation from the roughly 33% touchless market average to 90% is not a setting you flip; it is a redesign you walk through, and the honest benchmark explains why. Even best-in-class teams sit around 49% straight-through processing per Ardent Partners, the rest of the market clears about 1 in 3 invoices without a human touch, and the 90% numbers you see in headlines come from named enterprises that rebuilt the workflow around the agents. Konica Minolta ran reconciliation 75% faster at more than 45,000 line items a day, Dr Pepper Snapple cut financial-services cost by $2.5M, and cost per invoice falls from about $12.88 to $2.78 when a team crosses from average to best-in-class. What actually closes that gap is unglamorous: master-data cleanup, tolerance tuning, exception routing, and human-in-the-loop gates auditors trust. This article walks the before and after numbers honestly, then shows the design that produces them.

This is a case-study walkthrough, not a pitch. If you would rather we do this for you, see how we run AI workflow automation inside other companies. Everything below is yours to use whether or not we ever talk.

What does the real benchmark say about touchless AP?

Start with the number almost every vendor page softens. Ardent Partners, in its "AP Metrics that Matter 2025" research, finds that the average accounts payable team processes only about 1 in 3 invoices touchless, and that even best-in-class teams, the top roughly 20% of the market, benchmark around 49.2% straight-through. Read that twice. The best teams in the world, the ones held up as the standard, automate fewer than half their invoices end to end. The 90% touchless figure is not the benchmark; it is the ceiling that a small set of enterprises reached by doing the work the rest skip.

That gap is the whole story of this article. When a case study shows 90% touchless, it is describing the distance between a redesigned, clean-data operation and the default state of an AP department, not the distance between "no software" and "some software." Plenty of teams buy capture and matching tools and still land at the market average, because the tool runs the same broken process faster. The benchmark gap is an execution gap, and that is good news: the lift is available to anyone willing to do the redesign, not just to companies with a bigger budget.

The economics make the gap concrete. Ardent Partners puts best-in-class cost at about $2.78 per invoice with a roughly 3-day cycle time, against about $12.88 and 17 or more days for the rest of the market. APQC and Ardent both peg manual AP at $8 to $15 in fully loaded cost per invoice, while AI processing at high straight-through rates lands at $1 to $3. On real volume, the per-invoice swing is the single line that carries the entire ROI case, which is why the first thing to measure is your current cost and cycle time, not the technology.

What does the "before" picture usually look like?

Picture a mid-sized finance team before any of this. Invoices arrive by email, PDF, supplier portal, EDI, and scanned paper, and each channel has its own informal handler. A clerk keys header fields and line items into the ERP, eyeballs the purchase order, walks an approval down a hallway or a Slack thread, and schedules payment. At the close, someone exports a bank statement, opens the general ledger next to it, and matches transactions by hand, chasing the ones that do not line up: a missing reference, a batched payment, a bank fee, an intercompany transfer, an FX difference.

The visible symptoms are familiar. Cost per invoice sits near the market's $12.88, cycle time runs past two weeks, the close takes days longer than anyone wants, and a quiet 0.1% to 0.5% of payments go out twice because nothing reliably catches duplicates. Touchless rate, if anyone measures it, hovers around a third. The team is not unskilled; the process simply requires a human at every junction because it was designed that way, before agents could read an invoice's meaning.

This is the baseline that matters. You cannot prove a 90% touchless outcome without an honest "before," so the first move in any engagement is to capture four numbers: cost per invoice, touchless rate, cycle time, and days to reconcile. Without that baseline, the project has no ROI story and loses its budget the first time finance reviews spend.

How does AI change AP and reconciliation, mechanically?

Agents move the ceiling, where template OCR and RPA could not, because they read meaning instead of coordinates. Legacy capture tools matched fixed positions on a known invoice layout, so a new vendor template or an unexpected format broke the pipeline and dumped the work on a person. A large language model interprets the document the way a clerk would: it knows which number is the tax, which is the PO reference, which lines are charges, regardless of layout. That single shift is what lets touchless rates climb past the level RPA capped at.

On the reconciliation side, the documented pattern is three cooperating agents, each with one job. A Transaction Matching Agent pulls bank statements, the ledger, and payment records together and matches them, including fuzzy cases where a reference is missing or a payment is batched; agentic systems auto-match 90% or more of transactions here. An Exception Management Agent takes whatever did not match, groups it by likely cause, and routes only the real problems to a person with context attached. A Journal Entry Automation Agent posts the entries for everything that reconciled cleanly, automating about 95% of postings, and stages the rest for approval.

Prefer to run it yourself? You can Hire AI Agents and put one to work on capture and matching today.

The mechanical change is the foundation of every number above. The "after" state is not "the same process, faster." It is a process where an agent ingests, extracts, validates, matches, and posts on its own, and a human is pulled in only for true exceptions and money-moving decisions. McKinsey frames the shift three ways: automation of the repetitive keying and posting, augmentation of the analysis, and acceleration of the close where reconciliation bottlenecks. Gartner adds the trajectory: embedded AI in cloud ERP is expected to drive a 30% faster financial close by 2028. The capability is settled. The question is what it takes to realize it on a specific ledger.

What do the named enterprise results actually show?

Two named results anchor the "after" picture, and both come from real reconciliation deployments rather than projections. Konica Minolta ran bank reconciliation 75% faster while processing more than 45,000 line items a day. That second number matters as much as the first: at 45,000 line items a day, a human team cannot keep up at the close, so the gain is not the same work done faster, it is clearing a backlog that used to define month-end. Dr Pepper Snapple Group reported a $2.5M decline in financial-services cost alongside higher volume and productivity, which is the rarer and more convincing combination: cost down while throughput went up.

Around those named cases, the broader agentic-reconciliation results form a consistent band: around 99% accuracy, 90% or more of transactions auto-matched, about 95% of journal postings automated, and roughly a 30% cut in days to reconcile. Put next to the AP economics, where best-in-class clears invoices at about $2.78 against the market's $12.88, you get the full "before and after":

MetricMarket average (before)Best-in-class / documented (after)
Touchless / straight-through rate~33%up to 90%, best-in-class benchmark ~49%
Cost per invoice~$12.88~$2.78
Invoice cycle time17+ days~3 days
Reconciliation accuracymanual, error-prone~99%
Transaction auto-matchmostly manual90%+
Journal-posting automationmanual~95%
Days to reconcilebaseline~30% fewer

The honest caveat sits inside the same table. Best-in-class straight-through is about 49%, and the 90% figure is the top of the documented range, not a typical landing spot. The enterprises that reached it did so on redesigned, clean-data operations. Treat 90% as proof that the ceiling is real, and 80% as the ambitious-but-reachable target for a well-executed program. Either way, the gap between your current third and the high end is the size of the prize.

What actually closes the gap from 33% to 90%?

Here is the part the case studies skip, because it is the part the software does not do for you. Four work streams sit between the market average and the documented results, and each one independently caps your touchless rate if you ignore it.

  • Workflow redesign. Strip out the handoffs, rekeying, and approval chains that exist only because a person used to do the work. Automating a broken process just runs the mess faster, and it is the most common reason a tool purchase lands a team right back at the average.
  • ERP master-data cleanup. Deduplicated vendors, correct tax IDs, validated bank details. This is where entity resolution succeeds or fails, and it is where the 0.1% to 0.5% duplicate-payment leak gets sealed. Dirty master data quietly sinks more projects than any model limitation.
  • Tolerance tuning on 3-way matching. The matching agent compares each invoice to its purchase order and goods-receipt note line by line, with tolerance rules for small price or quantity differences. Set tolerances too tight and everything becomes an exception; too loose and errors get paid. This dial is, more than anything, what sets the touchless number.
  • Exception routing and human-in-the-loop design. Decide which cases an agent clears alone, which it routes to a person, and how context travels with each one. This is the work that earns trust, and it is covered in its own section below because auditors live or die on it.

None of that ships in a license. It is execution work, and it maps exactly onto the gap between Gartner's finding that 90% of finance functions will run at least one AI tool by 2026 and McKinsey's finding that only about 6% have actually scaled gen AI. Almost everyone has a tool. Almost no one has done the four work streams. A clean-data and process assessment up front is what saves months later, which is why serious engagements start with AI feasibility and data readiness rather than with the agents.

How do you design exception routing auditors trust?

The single biggest difference between a demo and a deployment is what happens to the cases that do not cleanly resolve. In a demo, everything matches. In production, a steady stream of invoices and transactions land in a gray zone, and how you handle them decides whether finance, the close team, and the external auditor ever let the system run with real autonomy.

The design that works treats exceptions as a first-class workflow, not an afterthought. The agent does not silently guess and move on; it classifies the exception by likely cause (timing difference, bank fee, FX, intercompany, duplicate, genuine error), attaches the source documents and the reason it could not resolve, and routes it to the right person with everything they need to decide in one screen. Equally important, every action it does take clean carries the same trail: what it decided, what it matched against, which tolerance applied, and which documents it relied on. When an auditor asks "why was this invoice paid," the answer is a record, not a shrug.

Human-in-the-loop gates then sit at the few points where the cost of an error is highest. Releasing payment is the obvious one: the action you never let an agent take unsupervised at the start, and you widen its leash only after it has earned trust on lower-risk decisions. Posting non-routine journal entries is another, and so is clearing any exception above a materiality threshold. Done this way, autonomy grows gradually and visibly, the touchless rate climbs without anyone losing control, and the close team stops drowning in noise because the agent only hands them the cases that genuinely need a human. That combination, autonomy with a clean audit trail and the right gates, is what separates the 90% deployments from the stalled pilots.

Why is payback measured in months, not years?

The scaling averages hide a fact that the per-invoice economics make obvious: the savings start on the first batch you move, not at the end of a multi-year roadmap. When cost per invoice drops from roughly $12.88 toward $2.78 on the volume you have automated, the return is immediate and proportional to throughput. You do not have to reach 90% touchless across the whole operation to see it; you only have to move your highest-volume slice and measure the lift.

The ROI bands back this up. At 80% or more touchless, three-year ROI runs above 300%, and even at 60% to 70% touchless you are looking at roughly 2 to 3 times return over three years. Because those returns compound on volume you capture early, breakeven on a well-scoped program lands in months. The reason so many teams never feel that fast payback is the same reason the benchmark gap exists: they buy the tool, bolt it onto the old process, stall at the average, and conclude the technology underdelivered. It did not. The redesign never happened.

What mistakes keep teams stuck at the average?

The failures repeat, and they are almost never about the model.

  • Treating AP and reconciliation as separate projects. They are one loop. A matching exception you wave through in AP becomes a discrepancy at the close. Scope them together, or you debug the same problem twice.
  • Skipping master-data cleanup. The most common reason do-it-yourself attempts stall. If vendor records are dirty, entity resolution fails, duplicates slip through, and the whole pipeline loses the team's trust.
  • Chasing full autonomy on day one. Releasing money is the last thing to automate, not the first. Start with tight gates and widen them as the agent earns it.
  • Believing case-study numbers ship in the box. The 90% touchless and 99% accuracy results come from the redesign, tuning, and exception work around the software. Buying the license and expecting the headline number is the surest way to land at 33%.
  • No baseline. If you never captured cost per invoice, touchless rate, cycle time, and days to reconcile before you started, you cannot prove the lift, and the project loses its funding.

Avoid these and you have won most of the battle, because the technology rarely fails here. The execution around it is what stalls, and the four work streams above are exactly what the stuck teams skipped.

How should you start?

Pick the highest-volume slice of the loop, usually invoice capture and 3-way matching, and build it end to end before you widen scope: ingest from every channel, extract with OCR plus an LLM, resolve and dedupe against cleaned master data, match with tuned tolerances, and route exceptions to a person with context attached. Capture your four baseline numbers first, then measure the lift. Once it clears your bar, extend the same loop into approvals, payment, and the three-agent reconciliation pattern at the close, tightening the human gates only where the cost of error is real.

That sequence is the difference between joining the 90% of finance functions that own a tool and the small share that actually capture the 300%-plus ROI. The work that gets you there is the redesign, the clean data, and the exception design, not the license.

If you want the fastest path across the benchmark gap, you can skip the trial and error. We plan, build, and run the AP and reconciliation agents inside your existing ERP and bank feeds, own the master-data cleanup and tolerance tuning, and design the human-in-the-loop gates your auditors trust. Book a free consultation below and we will map your first slice together.