When HR Agents Overrun Budgets, the Exception Desk Decides

On a Tuesday morning in May 2026, a payroll operations lead at a national retailer asks an HR agent to fix a missing overtime payment before the next pay run closes. The request looks small. One employee. One correction. One service case.

The agent does not experience it as one task.

It reads the employee’s time punches, checks the worker profile, retrieves the applicable overtime policy, compares the manager approval trail, opens a case, drafts the employee response, calls the payroll workflow, logs the correction, and preserves an audit record. One connector times out. The agent retries. A model fallback handles the policy summary because the first route is unavailable. The case now touches Microsoft, Workday or Oracle, ServiceNow, a model provider, an integration layer, a payroll vendor, and an evidence store.

The employee sees an answer. Finance sees a stack of small charges.

The stack is no longer theoretical. Salesforce publishes Agentforce examples with actions and Flex Credits. Microsoft bills Copilot agent usage through a message meter and sells Agent 365 as a governance layer. Workday makes Sana available through Flex Credits. Oracle’s HCM agent pricing includes authorized-user, employee, and token-pool metrics. ServiceNow’s Action Fabric and AI Control Tower put consumption metering, audit trails, session management, and cost tracking around cross-system agent work.

Outcome pricing created the first buyer argument: did the agent produce a result worth paying for? The next argument is more operational. When the agent produces too many billable events on the way to that result, who decides whether the overrun was approved work, avoidable waste, buyer-caused friction, or a vendor defect?

HR teams will not answer that at annual renewal. The argument will arrive in monthly invoice exceptions.

A Payroll Retry Hits Three Meters

The cleanest way to understand the new cost problem is to follow a failed step.

A payroll correction agent tries to update an employee service case after verifying the overtime issue. The ServiceNow tool call fails because the case API returns a temporary error. The agent retries. The second call succeeds, but the first response already triggered a logged action, a message, and an integration event. The agent then asks the employee a clarifying question because the manager approval note conflicts with the time record. That exchange creates another message. The workflow exports a replay file because payroll corrections can become wage-hour evidence.

No single charge is alarming. Together they turn one employee case into a dispute.

The vendor may say the agent did useful work and followed the configured retry policy. HR operations may say the retry prevented a manual case. Finance may ask why the same case consumed two action events, several messages, an evidence export, and a model fallback. Legal may say the export was required. IT may say the connector failure came from the vendor’s side. Procurement may ask whether the failed call should be credited.

The organization needs a place where those positions become one decision: an agent cost exception desk. It does not have to be a permanent department. It can be an operating process run by HR operations, Finance, IT, procurement, security, and legal. It needs a queue, a taxonomy, an evidence pull, an owner, a vendor response clock, and a commercial decision.

Without that process, every overage becomes a story told from a different dashboard.

The cost unit matters because HR workflows carry employment consequences. A customer support bot that retries a knowledge lookup may waste a few cents. A payroll agent that retries a correction may also change records, notify an employee, trigger wage evidence, and start a compliance clock. A recruiting agent that reprocesses an applicant pool may touch candidates who already received a disposition. A performance agent that regenerates manager notes may create inconsistent evidence.

Small charges are attached to sensitive work.

The exception process has to classify the overage before it argues about the money. A usage spike caused by an approved hiring surge is different from a spike caused by a connector loop. A high evidence-export cost for a termination-support workflow is different from the same export policy applied to cafeteria-hour questions. A model fallback approved by the workflow owner is different from an expensive route silently selected by the vendor.

The first dispute is rarely about dollars. It is about control.

Product Launches Made Overage Visible

Vendors have started exposing enough pricing and control language for buyers to see where exceptions will come from.

Salesforce Agentforce pricing lists Flex Credits at $500 per 100,000 credits and shows standard actions consuming 20 credits each. Its public examples are useful because they turn agent work into arithmetic. A new employee onboarding example counts questions, actions, credits, and monthly cost. A case-management example scales the same logic across users, cases, and days.

The transparency is helpful. It also teaches buyers to audit every workflow path.

Microsoft Copilot pay-as-you-go documentation describes agent and custom Copilot usage through a Copilot Studio message meter, while Agent 365 gives companies a governance layer for agents across identity, security, data governance, registry, analytics, and lifecycle control. In practice, that means the same HR automation can create a usage event in the place where employees work and a governance cost in the place where agents are managed.

Workday introduced Sana as an agentic work layer that can answer questions, take action, and automate workflows across Workday and connected applications. Sana for Workday, Sana Self-Service Agent, and Sana Enterprise run through Workday Flex Credits. That puts HR self-service and cross-application agent work inside a credit wallet rather than a simple seat.

Oracle’s Fusion Agentic Applications for HR show the system-of-record version of the same shift. Oracle describes coordinated agent applications for hiring, employee help, manager support, talent calibration, workforce operations, career advancement, learning, and contract compliance. Its price-list structure includes custom AI agent metrics and AI token pools. Population coverage, authorized users, and model capacity can all matter in one HCM deployment.

ServiceNow’s Action Fabric announcement is especially relevant to exceptions because it lets external agents call ServiceNow actions through a generally available MCP Server while routing work through AI Control Tower. ServiceNow names identity verification, permission scope, auditability, OAuth, session management, role-based tool packages, and consumption metering. Its separate AI Control Tower expansion adds cost tracking and ROI dashboards for AI systems, agents, and workflows.

These announcements do not prove vendors are overcharging. They prove agent work has become meterable at multiple layers.

A leave-policy question can be cheap if it stays inside a knowledge base. It can become expensive if it checks eligibility, opens a case, writes to Workday, sends a message, preserves evidence, and escalates to a specialist. A candidate-screening workflow can be cheap if it filters before summarizing. It can become expensive if it summarizes every applicant, checks duplicate profiles, sends messages, schedules interviews, exports audit records, and retries failed ATS updates.

The pricing objects are moving closer to the workflow.

Engineering teams have a fair point: cost should follow work. The governance challenge is that HR work is not a single-vendor object. A workflow that starts in Copilot, calls Workday, writes to ServiceNow, uses Salesforce case logic, runs an Oracle HCM process, and preserves evidence in a security stack will not produce one clean bill. It will produce partial bills from systems that each see part of the run.

Exception handling is the buyer’s attempt to reconstruct the run before the renewal room turns it into leverage.

Finance Was Already Moving Toward This Fight

The agent exception desk will not arrive as a new HR invention. Finance and IT cost teams have already been pulled into the same problem from the other side.

The State of FinOps 2026 report says FinOps has expanded beyond public cloud into AI, SaaS, licensing, private cloud, data center, and even labor cost in some organizations. It reports that 98% of respondents now manage AI spend, up from 31% two years earlier, and that 90% manage SaaS or plan to do so in the coming year. The language has shifted from cloud optimization toward technology value management.

HR agents sit across those categories. They are not pure cloud workloads. They are not only SaaS licenses. They are not only model tokens. They are not only labor substitution. They are work processes that convert HR operations into measurable technology consumption.

Zylo’s 2026 SaaS Management Index shows why Finance will not wait patiently. In its survey of 218 IT leaders, 78% reported unexpected charges tied to consumption-based or AI pricing models in the previous 12 months, and 61% said unplanned SaaS cost increases forced them to cut projects. Zylo also found business units control most SaaS spend while IT directly manages a much smaller share.

The pattern maps directly to HR. HR owns the workflow need. IT owns many of the controls. Finance sees the bill. Procurement sees the renewal. Legal sees the recordkeeping exposure. Vendors see adoption.

The exception desk is the operating bridge between those groups.

It should not only ask whether a charge is high. It should ask whether the charge corresponds to an approved value path. Did the workflow process more candidates because hiring volume grew, or because the filter ran too late? Did the payroll case cost more because evidence was legally required, or because the connector retried a broken write? Did a premium model route improve a high-risk decision, or did it fire because no one set a cap?

FinOps can provide discipline around allocation, showback, forecasting, and anomaly detection. HR has to provide the business meaning of the anomaly. A spike in storage, tokens, or actions may be waste in one workflow and required evidence in another.

Cost data alone cannot know the difference.

Recruiting Turns Small Charges Into Large Arguments

Recruiting makes the exception problem visible earlier than most HR functions because volume converts tiny charges into material spend.

An individual payroll correction may involve a handful of events. A candidate-screening agent can run across tens of thousands of applications during a seasonal surge. It may pull requisition criteria, de-duplicate profiles, read resumes, check knockout questions, infer availability, send candidate messages, schedule interview blocks, prepare manager packets, update ATS stages, and export evidence for rejected candidates.

One inefficient design choice can scale quickly.

If the agent summarizes every resume before applying minimum criteria, model and message costs rise. If it calls the calendar before confirming work authorization, scheduling costs rise. If it sends candidate messages before deduplicating profiles, communication costs rise. If it exports full evidence for every low-risk interaction, compliance storage and audit costs rise. If it retries a broken integration across the whole pool, the incident becomes a bill.

The demand pressure is real. ICIMS and Aptitude Research reported on April 30, 2026 that 69% of companies use AI in talent acquisition, 46% are using or planning to use agentic AI, and 45% do not yet have a formal AI governance framework. Screening, candidate communication, assessments, and sourcing are already common use cases. That means high-volume recruiting teams are adding agentic workflows before many organizations have cost and governance controls mature enough to absorb them.

Finance will not object to useful volume. It will object to unclassified volume.

The recruiting leader may say the agent processed 40,000 applications and saved recruiters hundreds of hours. Finance may say the workflow consumed far more actions, messages, model calls, identity checks, and evidence exports than forecast. Legal may say some exports were necessary because candidate disposition decisions require records. Procurement may say the vendor counted duplicate applicants as separate outcomes. The vendor may answer that the buyer’s requisition data caused the extra processing.

Everyone can be partly right.

The exception desk has to separate five situations that look similar on a bill:

Overage pattern	Likely owner	Commercial treatment
Approved hiring surge	Business owner	Billable, but should update forecast and cap
Bad source data created rework	Buyer process owner	Usually billable unless vendor skipped validation
Connector outage caused retries	Vendor or integration owner	Credit or exclusion depends on fault and retry rules
Wrong workflow route used premium model or extra tools	Vendor, workflow owner, or both	Requires replay file and approval history
Evidence export exceeded policy tier	Legal, compliance, or configuration owner	Billable if required, disputed if misclassified

The table is uncomfortable because it prevents a simple answer. Buyers cannot call every overage a vendor defect. Vendors cannot call every overage successful adoption. The same invoice line can represent growth, waste, risk control, or failure.

Recruiting also creates a human trust problem. If an agent spends too much because it reprocessed candidates after using stale criteria, the budget issue is secondary. The company may need to reopen candidate decisions, notify recruiters, repair manager packets, or preserve evidence for a complaint. A cost exception becomes an operating incident.

HR cannot delegate exception handling entirely to cloud FinOps or SaaS management. Those teams understand meters, rightsizing, rate cards, and chargeback. HR operations understands why a duplicate candidate, wrong shift, missing credential, inaccessible interview slot, or stale salary band changes the business meaning of a charge.

The desk needs both.

Exception Codes Decide Who Pays

An exception process only works if the organization codes failures with enough precision to support a commercial decision.

“The agent cost too much” is not a code. “The vendor should refund this” is not a diagnosis. “AI made a mistake” is not evidence.

A useful HR agent exception desk should start with a short, enforceable taxonomy:

Exception code	Definition	Evidence needed
Source-data defect	Authoritative HR, payroll, ATS, or policy data was wrong or missing before the agent acted.	Source record, timestamp, data owner, workflow dependency
Workflow design defect	The configured path caused unnecessary steps, late filtering, duplicate actions, or avoidable evidence export.	Workflow version, run plan, approval record, expected path
Vendor execution defect	The agent skipped a configured step, used an unapproved source, misrouted a tool call, or failed to preserve required evidence.	Replay file, tool trace, model route, policy check, audit log
Integration defect	Connector timeout, duplicate write, failed API call, or MCP/server issue caused retries or repeated actions.	Error log, retry count, owning system, incident window
Model route defect	The workflow used a more expensive model, fallback, or prompt path without the required approval.	Model route, fallback trigger, approval setting, cost delta
Human-review defect	A required reviewer approved too quickly, lacked authority, ignored warnings, or caused rework.	Reviewer identity, warning state, approval time, override note
Evidence defect	Vendor or platform cannot produce the trace needed to judge the charge or employment action.	Missing fields, export failure, retention setting, vendor response

The evidence defect should receive special treatment. If the vendor cannot produce the replay file, tool trace, or cost record needed to judge the event, the buyer should not carry the uncertainty. Employment workflows already require defensible records. A vendor that charges for agent work has to prove what work was done.

Regulation makes that more than a procurement preference. The California Civil Rights Department said final employment automated-decision regulations took effect on October 1, 2025 and require employers and covered entities to maintain employment records, including automated-decision data, for a minimum of four years. Colorado’s SB26-189, signed as a 2026 act, requires developers and deployers to retain records necessary to demonstrate compliance for at least three years and gives affected consumers a post-adverse-outcome process that can include a plain-language description, data correction, human review, and reconsideration.

Recordkeeping changes the cost debate. An audit export for a rejected candidate, payroll correction, promotion screen, or scheduling decision may look like an extra billable event. It may also be the reason the employer can answer a later request. The exception desk has to know which exports were required by risk policy and which were generated because the workflow classified the case incorrectly.

The taxonomy also protects vendors.

If a payroll correction became expensive because the buyer’s timekeeping data was wrong for two weeks, the vendor should not automatically lose the fee. If a recruiting workflow reprocessed candidates because HR changed criteria after launch, the cost may belong to the business owner. If the agent generated extra messages because the recruiter kept asking clarifying questions, the meter did what the user asked.

Blanket refund language sounds strong in negotiation. It breaks down in operations.

The better clause is conditional. Vendor-controlled defects trigger refund or service credit. Buyer-controlled defects remain billable but generate root-cause actions. Shared-process defects go to a pre-agreed split or service review. Evidence defects favor the buyer because the vendor failed to support verification.

The same logic should apply to approvals. An overage that crosses a workflow cap should create an approval event before the agent continues, unless delay would create legal or payroll harm. If the agent continues without approval, the buyer should be able to dispute the excess. If the buyer approves the overage, it should not later treat the approved spend as surprise waste.

Approval has to be specific. “Continue this payroll correction because the next pay run closes in six hours” is useful. “Allow more AI usage” is not.

The exception code turns an invoice fight into an operating record. It gives Finance a basis for credit. It gives HR operations a process fix. It gives vendors a way to distinguish bad data from bad product. It gives legal a path to preserve evidence when a cost issue also touches an employment claim.

Most important, it prevents the first line of defense from becoming a screenshot war.

Budget Caps Move Into Workflow Design

The seat era let procurement think about cost before usage. Buy licenses. Assign seats. Watch adoption. Renegotiate at renewal.

Agentic HR breaks that sequence. A workflow can spend every time it runs, and design decisions inside the workflow determine how much work is metered.

Budget control moves into workflow design.

A candidate-screening workflow should have a cost forecast before launch. The forecast should estimate applications per month, average tool calls per application, model route, message volume, evidence policy, integration events, expected human reviews, retry limits, and cost per qualified candidate. It should show how spend changes if the applicant pool doubles, if the model route changes, if every rejected candidate requires evidence export, or if the ATS connector degrades.

A payroll correction workflow needs a different model. The forecast should include case volume, system checks, pay-cycle dependency, evidence retention, escalation rate, retry policy, jurisdiction-specific policy checks, and service-credit triggers. It should separate low-risk informational cases from wage-impacting corrections.

An employee service workflow needs tiering. A handbook answer should not carry the same cost path as a leave eligibility case, benefits update, employee relations triage, or accommodation request. If all cases run through the same expensive evidence and approval path, the workflow will look compliant and wasteful at the same time.

Finance will ask for caps by workflow, not only by product:

Maximum monthly spend by named workflow.
Maximum cost per run before approval.
Maximum retries before human review.
Model fallback approval rules.
Evidence-export tier by risk class.
Excluded events for vendor-caused failures.
Overage approval path by business owner.
Refund and service-credit triggers by exception code.

These controls should live near the workflow builder. If HR operations changes a workflow from “filter first, summarize later” to “summarize first, filter later,” the cost forecast should change. If legal raises the evidence tier for hiring decisions, Finance should see the cost impact. If procurement negotiates free retries for vendor-caused connector failures, the exception desk should see that term when reviewing an overage.

The best vendors will not hide this behind an invoice PDF. They will show expected spend before the workflow goes live, then show actual spend by run, exception, and owner after launch.

Run-level budgets also defend vendors.

Without run-level budgets, buyers will slow adoption after the first surprise bill. With run-level budgets, vendors can prove that a workflow became expensive because demand grew, not because the product was wasteful. They can also show where buyers approved expensive evidence, premium routing, or human escalation because the work justified it.

Usage-based pricing can align price with value only if the buyer can see the route from usage to value.

For HR, that route runs through people, policy, payroll, recruiting, manager decisions, employee trust, and legal records. The meter has to follow the route without pretending the route is simple.

Shared Responsibility Beats Blanket Refunds

The exception desk will fail if every participant arrives with a moral claim.

Vendors will say buyers configured the workflow, supplied the data, and approved the rollout. Buyers will say vendors designed the agent, counted the events, and promised automation savings. HR will say the workflow served the business. Finance will say the spend exceeded the forecast. Legal will say evidence was non-negotiable. IT will say the integration behaved inside its documented limits.

All of those claims can be true.

Shared responsibility has to be explicit before the bill arrives. The contract and operating playbook should name which party owns which failure modes.

The vendor should own agent orchestration against the approved workflow, model route disclosure, tool-call logging, evidence export, retry behavior, known-defect remediation, permission enforcement, excluded source controls, and replay-file production. If the agent uses an unapproved data source, skips a required approval, silently upgrades to a costly model route, or cannot produce the trace needed to judge a charge, the vendor has a strong refund problem.

The buyer should own source-system accuracy, policy maintenance, workflow approval, human reviewer capacity, cost-owner assignment, role and permission design, escalation paths, and business-volume decisions. If HR launches a screening workflow against stale requisitions or Finance approves a seasonal hiring surge, the buyer cannot treat the resulting spend as vendor waste.

Shared defects need a middle lane. A connector timeout may involve vendor software, buyer network settings, and third-party API limits. A wrong leave answer may involve vendor reasoning and an outdated policy article. A duplicate candidate charge may involve ATS data quality and agent deduplication logic. A premium model route may involve a vendor default and a buyer failure to set a cap.

Those cases should not wait for executive escalation. The exception desk should have commercial defaults:

Fault pattern	Default answer
Buyer-controlled input defect	Billable, with required process correction
Vendor-controlled execution defect	Refund, exclusion, or service credit
Shared-process defect	Split treatment or temporary credit pending remediation
Evidence unavailable	Buyer-favorable credit unless buyer caused retention failure
Approved business surge	Billable, with forecast update
Unapproved cap breach	Disputed excess, workflow pause, and approval review

This structure is less dramatic than “pay only for outcomes.” It is more usable.

HR work is full of dependency chains. A payroll correction depends on timekeeping, manager approval, payroll cutoff, local rules, and employee communication. A hiring workflow depends on job intake, salary range, location, work authorization, manager criteria, candidate identity, assessments, scheduling, and human review. A leave case depends on jurisdiction, tenure, policy, manager approval, benefits carrier data, and employee documents.

Agents can improve those chains. They can also expose how fragile the chains were before automation.

Blanket refund rights ignore that fragility. Vendor-only blame encourages buyers to keep dirty processes. Buyer-only blame lets vendors charge for opaque automation. Shared responsibility forces both sides to make the workflow legible.

Legibility becomes the product.

Renewal Rooms Ask for the Dispute File

The 2027 renewal meeting will not be won by the vendor with the highest automation count.

The vendor will bring a dashboard showing completed workflows, cases deflected, candidates processed, payroll corrections routed, employee questions answered, manager summaries drafted, and hours saved. HR may support the story because employees and managers like the faster service. The CIO may support it because the workflow stayed inside governed tools. The CHRO may support it because recruiters and HR operations were under capacity pressure.

The CFO will ask for the dispute file.

How many workflow runs exceeded forecast? How many cap breaches were approved? How many retries were caused by vendor defects? How many model fallbacks were unapproved? How many charges were reversed? How many credits came from connector failures? How many overages came from buyer data problems? How many evidence exports were legally required? How many were misclassified? Which workflows produced durable value after the cost was counted?

The file will decide whether usage-based HR AI pricing scales.

If the file is empty because no one tracked exceptions, procurement will treat the entire AI bill as suspect. If the file is full of unresolved disputes, Finance will freeze the next wave of automation. If the file shows clean classification, fair credits, approved surges, and concrete workflow fixes, the vendor can defend expansion and the buyer can defend spend.

The exception desk is not a brake on AI adoption. It is the control that lets adoption survive contact with real budgets.

This matters for HR service providers as much as software vendors. RPO firms, staffing companies, payroll outsourcers, and shared service operators will use agents to reduce labor hours and improve throughput. They will also absorb or pass through model usage, workflow actions, SMS charges, identity checks, evidence exports, quality review, and exception handling. Buyers will ask whether service margin gains are being shared. Providers will answer that they are taking on tooling cost, governance work, and defect risk.

The dispute file will become the common record.

It will show whether the provider’s agent caused rework or whether the client supplied bad criteria. It will show whether a staffing workflow spent more because candidate volume rose or because the automation looped through failed messages. It will show whether a payroll outsourcer charged for successful corrections or for cases that had to be reopened after the pay cycle.

Agentic HR will not stay inside software procurement. It will reshape service economics.

The operating lesson is simple enough to write before the next pilot: no HR agent workflow should go live without a forecast, a cap, an exception codebook, a replay file, and a named cost owner. The workflow can still be ambitious. It can still automate real work. It can still use premium models and governed action layers when the risk justifies it.

It should not be allowed to surprise the company into trust.

The payroll lead still wants the overtime correction fixed before the pay run closes. The employee still wants the money. HR still wants a faster answer. The vendor still wants to be paid for useful work.

Finance wants to know why one correction became twelve billable events.

The exception desk is where that answer becomes a decision.

This article analyzes agent cost exceptions, usage overages, refund logic, and workflow-level budget controls for HR AI. Published May 20, 2026.