HR AI Needs a Runtime Tool Approval Ledger

The Button Before the Decision

The recruiter did not approve a rejection.

She approved a tool call.

That distinction did not matter until the candidate challenged the result.

The hiring team at a regional logistics company had been using an AI assistant to move warehouse applicants through screening, interview scheduling, and first-shift readiness. The assistant did not make final decisions. The vendor said that clearly. It drafted summaries, checked availability, compared candidate records against job requirements, generated follow-up messages, and prepared a shortlist for recruiters.

On a Tuesday morning, the assistant asked to use a scheduling connector. The screen showed a familiar approval prompt. The connector would “retrieve shift compatibility and availability constraints.” The recruiter clicked approve. She had clicked the same button dozens of times.

The connector returned two pieces of data: a current availability form and an older shift rule from a store template that had not been updated after a policy change. The model compressed the conflict into one sentence. The assistant wrote the summary into the ATS and recommended moving other candidates first.

The candidate later showed that he was available for the required shifts. The original rejection did not come from a single model error. It came from a permitted tool call, an outdated retrieval source, a broad approval prompt, and a human reviewer who had no reason to know the connector was returning two different policy versions.

The incident review produced four artifacts. The ATS record showed the final recommendation. The vendor log showed the model output. The subprocessor list named the companies involved. The chain-of-custody trace showed the systems touched.

None answered the question legal wanted answered.

Who approved that tool call, for that purpose, with that data scope, at that risk level, and what exactly did the approver see?

That is the missing control in HR AI.

Over the past month, the buyer conversation has moved from broad AI governance into operating evidence. Employers now need evidence packets, audit rooms, kill switches, quarantine layers, recovery SLAs, remediation warranties, evidence escrow, decision recall, correction propagation, and subprocessor chains of custody. Each layer matters. Each assumes the company can reconstruct what happened after the fact.

The next layer starts earlier.

Before an agent retrieves salary history, queries performance notes, calls a background-check status tool, reads an employee-relations case, writes a candidate status, generates a payroll correction, or sends a manager packet, someone or something has to approve the action.

That approval cannot remain a checkbox buried in a vendor admin console.

It has to become a runtime tool approval ledger.

The ledger is not a policy document. It is a record of permission at the moment of use. It links the agent, user, tool, model, MCP server, data class, business purpose, risk tier, approval rule, human approver where required, expiration time, output destination, and later review outcome.

In ordinary software, this may sound excessive.

In HR AI, the tool call is becoming the decision surface.

Why Approval Moved From Policy to Runtime

HR AI adoption has reached the awkward middle stage. It is common enough to affect real workflows, but not mature enough to have stable governance.

On April 30, 2026, ICIMS and Aptitude Research released survey findings from more than 400 U.S. talent acquisition leaders and practitioners. Sixty-nine percent of companies said they were using AI in some capacity in talent acquisition. Only 18% said they were using AI broadly across hiring processes.

The use cases were no longer peripheral. Screening led at 58%, candidate communication followed at 54%, assessments at 50%, and sourcing at 46%. Nearly half of companies, 46%, said they were using or planning to use agentic AI in talent acquisition.

Those numbers explain why approval has become more difficult. A resume-screening tool can be reviewed as a product. An agentic recruiting workflow has to be reviewed as a sequence of actions.

A recruiter may ask for a candidate slate. The agent may query the ATS, call a scheduling tool, retrieve a job requirement, inspect a skills taxonomy, summarize assessment results, draft a message, and update a CRM status. One user request can become eight tool calls. Some are harmless. Some touch sensitive data. Some write records. Some affect who receives an interview.

Blanket approval cannot carry that load.

The same ICIMS report found that 82% of companies considered transparency and explainability important, while 45% did not yet have a formal AI governance framework. It also found that recruiters overrode AI recommendations in 58% of organizations when conflicts arose.

That last number is easy to misread. It suggests humans still matter. They do. But a recruiter can only override what she can see. If the approval screen hides the data source, tool capability, policy version, model route, or downstream write action, the human is approving a label, not a decision boundary.

SHRM’s 2026 HR AI report shows the organizational reason this gap persists. SHRM surveyed 1,908 HR professionals in December 2025 and found that 39% had AI adopted in their HR functions, while another 7% intended to launch AI in HR during the year. Across the full sample, 62% said their organizations were using AI somewhere.

Yet more than half, 56%, said they did not formally measure the success of AI investments at all. Legal and compliance led AI governance and oversight in 37% of organizations. More than half, 52%, said HR was not directly or collaboratively involved in overall AI strategy and vision.

That produces a fractured approval model.

Legal approves vendor terms. Security approves access. IT approves integration. Procurement approves purchase. HR approves workflow. Managers approve outputs. Recruiters approve recommendations. The candidate or employee experiences the result.

Nobody owns the moment when the agent says: may I call this tool now?

The answer used to be hidden inside implementation. A vendor’s engineering team chose which APIs the product could call. A customer admin configured integrations during deployment. A security review checked scopes. After launch, the tool ran.

Agentic HR breaks that model because the runtime path changes with each task. A tool that is low-risk in one context can become high-risk in another. Reading a job description to draft interview questions is not the same as reading employee-relations notes to draft a performance summary. Checking shift availability is not the same as writing a schedule change. Retrieving a public policy is not the same as retrieving pay history.

The same connector can cross the line.

This is why approval is moving from procurement to runtime. The buyer still needs predeployment review. It also needs per-use evidence for high-risk actions.

The approval event is no longer administrative. It is part of the employment record.

MCP Turned Tools Into Decision Surfaces

The Model Context Protocol did not invent tool use. It standardized the way AI systems connect to tools and data sources. That made the governance problem more visible.

MCP gives agents a common way to reach files, databases, APIs, workflow systems, developer tools, business applications, and internal knowledge sources. For enterprises, that is useful because it reduces one-off integration work. For HR, it is risky because a standard tool interface can quickly become a standard path into candidate, employee, payroll, performance, scheduling, learning, and employee-relations records.

The official MCP security materials are direct about the authorization problem. The MCP authorization specification says clients must use OAuth resource indicators so tokens are requested for the intended MCP server, and MCP servers must validate that presented tokens were issued specifically for them. It also says an MCP server must not pass through a token it receives from a client.

The MCP security best practices explain why. Token passthrough can break accountability and audit trails because the MCP server may not distinguish clients, while downstream systems may log requests under the wrong source or identity. The document also calls for per-client consent to prevent confused-deputy attacks.

Those are security controls. In HR, they become decision controls.

If a recruiting agent calls a candidate-screening tool with a token meant for a broader HR system, the problem is not only unauthorized data access. It is that the later decision record may misstate who acted, what authority was used, and which data was available. If a performance agent calls a document retrieval tool through a proxy that cannot distinguish the client, the employer may not know whether the output relied on performance notes, informal manager comments, or employee-relations material. If a payroll agent writes through a shared service account, the downstream system may show an automation event without the human request, agent identity, or approval rule that triggered it.

Approval has to sit at the same level of precision as authorization.

The tool call needs a purpose. The token needs an audience. The data request needs a scope. The approval needs a record.

This is especially important because MCP tools are not all the same. A tool may be read-only, write-capable, destructive, external-facing, local, tenant-bound, vendor-operated, customer-operated, or run by an implementation partner. It may return raw personal data, derived scores, policy text, calendar slots, pay bands, interview feedback, background-check status, accommodation notes, or case history.

The label “MCP server” says little about employment risk.

A useful HR approval model has to classify tool calls by what they can do:

Tool-call class	HR example	Approval problem
Read public or low-risk content	Retrieve job description, public policy, interview guide	Low risk, but still needs source and version logging
Read candidate or employee record	Query ATS profile, HRIS field, skills profile, shift availability	Requires purpose, data minimization, and identity trace
Read sensitive employment context	Employee-relations case, pay history, accommodation record, disciplinary note	Requires high-risk approval or explicit policy rule
Generate decision input	Candidate ranking, promotion packet, performance summary, pay correction recommendation	Requires model route, tool inputs, reviewer visibility, and downstream destination
Write or change workflow state	Update candidate status, create payroll case, route employee service ticket, change schedule	Requires write authority, rollback plan, and action receipt
External or third-party call	Enrichment provider, assessment vendor, background check, model fallback, partner MCP server	Requires subprocessor mapping, retention terms, and evidence export

Most organizations already have fragments of this model. Security teams manage scopes. HR teams manage process rules. Vendors manage product permissions. Legal manages sensitive data categories. IT manages integrations.

The runtime ledger is where those fragments meet.

It records that this agent, acting for this user or workflow, called this tool, under this approved rule, for this HR purpose, within this data scope, producing this output, sent to this destination, with this later review.

That is not merely audit logging. It is approval provenance.

What Microsoft and ServiceNow Are Really Shipping

The market is starting to expose the missing layer.

Microsoft’s Agent 365 is the clearest sign that MCP governance is moving into mainstream enterprise administration. On May 1, 2026, Microsoft announced Agent 365 general availability for commercial customers. The company said that starting in June 2026, Microsoft Defender would provide asset context mapping for agents, including the devices they run on, MCP servers configured for those agents, associated identities, and reachable cloud resources.

That is the after-the-fact map.

The more important approval signal is Microsoft’s Bring Your Own MCP server preview. Microsoft describes a developer-to-admin flow: a developer registers a remote MCP server with server URL, authentication type, and tools to expose; an IT admin reviews the server and declared tools in the Microsoft 365 admin center; the admin approves or rejects the request; upon approval, the admin grants required Entra permissions; security teams monitor MCP server activity and tool invocations through Defender advanced hunting.

The controls listed are direct: approval or rejection before use, server-level block, tools snapshot, and runtime enforcement so blocked MCP servers cannot be invoked.

This is not HR-specific. That is why it matters.

HR buyers rarely want to build their own agent security stack. They want existing enterprise control planes to understand HR risk well enough to govern it. Agent 365’s BYO MCP flow gives IT and security a place to approve tools. HR still needs to bring the employment context: which tools can be used for candidate screening, which for employee service, which for payroll, which for performance, which require human approval, and which should never be exposed to an agent.

ServiceNow is moving from the workflow side. In its AI Control Tower documentation for the Australia release, ServiceNow describes approval controls for AI assets and workflows. AI steward approval can block deployment of AI systems, MCP servers, and AI models until approved. The MCP server setting can prevent unapproved servers from being used in AI Agent Studio, and the documentation says an unapproved server produces an approval-needed message before tools can be displayed.

At Knowledge 2026, ServiceNow also said AI Control Tower had expanded across five dimensions: discover, observe, govern, secure, and measure. It described 30 new enterprise integrations across cloud providers and enterprise applications including Workday, as well as runtime observability into AI agent behavior through the Traceloop acquisition.

ServiceNow’s advantage is not just approval. It owns work.

If an HR AI incident becomes a case, a ServiceNow workflow can assign owners, set SLA timers, trigger approvals, send notices, preserve logs, and track closure. That makes it a natural place to connect tool approval with downstream action: the agent asked to call a tool, the steward approved under a policy, the tool returned data, the workflow used it, a human reviewer accepted or changed the output, and the case retained the evidence.

Workday starts from the workforce system of record. Its Agent System of Record describes an agent analytics hub for visibility and accountability, lifecycle management from registration to retirement, and a gateway for managing and metering agent interactions, including third-party agents. Workday says ASOR applies identity permissioning, observability, and data security measures across fragmented data sources.

In its Agent Gateway announcement, Workday described roles, data access, action control, partner agents, and shared protocols including MCP and A2A. It named a recruiting example with Paradox interview scheduling inside the Workday ASOR.

That is the HR-specific surface.

If a recruiting agent schedules interviews, answers candidate questions, and writes back to Workday Recruiting, the approval question cannot be answered only by an enterprise admin center. It also has to know the business process: who owns the requisition, whether the action affects candidate status, whether the tool uses availability data, whether candidate consent applies, whether the hiring manager sees the output, and whether the record is retained for later challenge.

The platform race is becoming an approval race.

Microsoft can govern agent tools across the productivity and security estate. ServiceNow can govern workflow action. Workday can govern people, money, and HR business process. The buyer will need all three kinds of evidence.

The vendor that can join them will sell trust.

Blanket consent is attractive because HR work is repetitive.

Recruiters screen many candidates. Payroll teams handle recurring exceptions. Employee-service teams answer similar policy questions. Managers review performance packets every cycle. Schedulers move shifts daily. If every low-risk agent action required manual approval, the system would become unusable.

The answer is not to approve everything once.

The answer is to tier approval by consequence.

The Cloud Security Alliance’s 2026 survey shows why. On April 21, 2026, CSA reported that 82% of enterprises had unknown AI agents in their environments, while 65% had experienced AI agent-related incidents in the previous 12 months. Among those incidents, 61% involved data exposure, 43% operational disruption, and 35% financial losses. Only 21% had formal decommissioning processes.

In a later CSA analysis of shadow agents, the organization argued that the deeper issue is not only inventory, but governance coverage across cloud platforms, internal orchestration systems, SaaS applications, and LLM environments. It specifically listed weakened approval pathways as a downstream issue when security teams do not know an agent exists.

HR has the same problem with higher social stakes.

A hidden sales agent may expose customer data. A hidden HR agent may expose candidate data, pay history, disability accommodation records, investigation notes, promotion inputs, schedule constraints, union-sensitive material, or employee-relations cases. It may also affect access to work.

That makes blanket consent dangerous in three ways.

First, it hides purpose. An admin may approve a scheduling connector for interview coordination. Six months later, the same connector may be used in a performance attendance summary or workforce reduction analysis. The data source did not change. The purpose did.

Second, it hides scope drift. A tool may begin as a narrow read-only availability lookup. After a schema update, it may expose location preferences, previous attendance exceptions, manager notes, or write capability. The approval remains green, but the tool is not the same tool.

Third, it hides reviewer ignorance. A recruiter may approve a prompt that says “retrieve candidate fit evidence” without seeing that the agent can call assessment scores, public profile enrichment, old interview notes, and availability constraints. The approval record may show human involvement. It may not show meaningful understanding.

This is where the runtime ledger has to be more precise than the admin policy.

For low-risk actions, the ledger can record automatic policy approval. For high-risk actions, it should require a human or steward approval tied to a specific purpose and data scope. For prohibited actions, it should block the call and record the attempted invocation. For emergency actions, it should allow temporary approval with expiration and retrospective review.

The approval model should look less like a single consent screen and more like a credit policy.

The organization decides risk limits in advance. Runtime systems apply those limits to each transaction. Exceptions are escalated. Approvals expire. High-risk transactions are sampled or reviewed. Repeated exceptions become policy changes.

HR AI needs the same discipline because every tool call has a potential downstream audience: candidate, employee, manager, auditor, regulator, vendor, court, or future model training process.

Approving a tool is not just allowing computation.

It is deciding what evidence is allowed to enter an employment workflow.

What the Ledger Must Capture

A runtime tool approval ledger should be small enough to use and complete enough to matter.

A log dump will not work. HR, legal, security, procurement, and business leaders need a case-readable record, not a thousand-line trace that only the vendor can interpret. The ledger should turn each high-risk tool call into a structured approval event.

At minimum, it should answer twelve questions.

Ledger field	Question it answers	Why it matters
Agent identity	Which agent requested the tool call?	Separates human, delegated agent, own-credential agent, and service account activity
Human or workflow sponsor	On whose behalf was the agent acting?	Links the action to a recruiter, manager, HR partner, payroll owner, or automated workflow
Tool or MCP server	What tool, connector, model, or data source was invoked?	Prevents generic “AI used” records
Tool version and schema	What capabilities existed at the time?	Captures scope changes and later drift
Business purpose	Why was the tool called?	Distinguishes interview scheduling from screening, pay correction, performance, or employee relations
Data classes	What categories of data could be read or written?	Supports minimization, sensitive-data controls, and later explanation
Authorization boundary	What token, audience, credential, role, or delegated permission was used?	Connects approval to actual enforceable access
Approval rule	Was the call auto-approved, human-approved, steward-approved, denied, or escalated?	Shows whether policy or person allowed the action
Approver context	What did the approver see at the time?	Tests whether the approval was meaningful
Output destination	Where did the result go?	Connects approval to downstream recall and correction propagation
Expiration and revocation	When does approval end, and how can it be withdrawn?	Prevents permanent permission from one temporary need
Review outcome	Was the call later sampled, challenged, corrected, or tied to an incident?	Closes the loop between permission and consequence

The hardest field is approver context.

Many systems can record that a person clicked approve. Fewer can show what the person understood. In HR AI, that matters. A manager who approves “use employee data” has not necessarily approved the use of employee-relations notes. A recruiter who approves “use assessment results” has not necessarily approved a third-party proctoring risk score. A payroll analyst who approves “review exception history” has not necessarily approved writing a correction to the payroll queue.

The ledger should preserve the approval screen or structured equivalent: tool name, operator, data classes, write capability, external recipient, retention class, risk tier, and plain-language consequence.

It should also preserve negative evidence. If the agent did not call a background-check tool, did not use an external enrichment provider, did not retrieve employee-relations notes, or did not write a status change, the ledger should be able to prove that absence for high-risk cases.

The risk tier should drive the burden.

Risk tier	Example HR action	Approval requirement
Tier 1: consequential decision support	Candidate ranking, promotion packet, pay correction, termination-risk summary, schedule allocation	Explicit approval rule, high-risk data scope, human or steward approval where required, complete evidence packet
Tier 2: active workflow execution	Interview scheduling, employee-service case routing, onboarding task assignment, payroll exception triage	Approved tool registry, purpose binding, write receipt, owner review
Tier 3: advisory content	Job description draft, policy summary, learning recommendation, manager coaching prompt	Automatic policy approval, source version, output owner
Tier 4: low-risk productivity	Formatting, translation, meeting summary with no employment action	Basic usage log and retention class

This tiering keeps the ledger from becoming theater.

Not every action deserves a legal-grade approval record. The key is to capture the moments where a tool call can change access to hiring, pay, scheduling, promotion, performance, leave, benefits, employee service, discipline, or termination.

Those are not engineering details.

Those are employment controls.

How Vendors and Buyers Will Negotiate It

The runtime tool approval ledger will become a contract issue because the buyer cannot build it alone.

The employer may own HR policy. It may own internal business process. It may own workforce data. It may own Microsoft 365, Workday, ServiceNow, an ATS, a payroll system, or a data warehouse. But the vendor often owns the agent runtime, model routing, prompt templates, tool registry, connector implementation, support traces, and product telemetry.

If the vendor cannot expose approval evidence, the employer cannot fully prove control.

This will change procurement in four places.

First, buyers will ask for a tool registry, not just a subprocessor list. The registry should identify approved MCP servers, APIs, models, connectors, data sources, and write actions for each HR workflow. It should include version, operator, data classes, write capability, retention, and whether the customer can block or approve the tool separately.

Second, buyers will ask for purpose binding. A tool approved for scheduling should not automatically be approved for screening, performance, payroll, or employee relations. The same data source can carry different risk depending on workflow. Contracts should require the vendor to support purpose-based controls, not only role-based controls.

Third, buyers will ask for approval evidence export. When a candidate, employee, auditor, or regulator challenges a result, the employer should be able to export the approval ledger for the relevant output. That export should show the agent, sponsor, tool, purpose, data scope, approval rule, approver context, output destination, and later review state.

Fourth, buyers will ask for drift review. A vendor should notify customers when a tool’s schema, permission, model route, fallback behavior, data retention, or external participant changes in a way that affects approval risk. The approval ledger becomes stale if the tool changes while the approval remains the same.

The strongest vendors will treat this as a product feature.

They will let customers define high-risk HR workflows, map tools to those workflows, set approval tiers, preserve approval context, expose audit exports, and connect tool approval to incident response. They will also help customers separate internal approvals from vendor approvals: IT approves a connector, HR approves employment purpose, legal approves sensitive-data use, security approves token scope, and the business owner approves write action.

The weakest vendors will treat it as a support request.

They will say logs exist but are not customer-facing. They will say approval is handled by admin roles. They will say tool invocations are too low-level for HR. They will say model routing is proprietary. They will say human approval is the customer’s responsibility. They will say API scopes are enough.

Some of that will be true. It will not be enough.

The NIST AI RMF Generative AI Profile gives buyers a language for the broader requirement. NIST frames generative AI risk across governance, provenance, pre-deployment testing, and incident disclosure, and it treats third-party risk, lifecycle risk, and ecosystem-level risk as part of the management problem. HR AI procurement will translate that into operational clauses: approval records, tool-call provenance, supplier change notice, incident support, and evidence retention.

The business reason is simple.

When an AI-assisted HR action is challenged, the employer cannot answer with “the vendor had permission.” It must answer with what permission, for which action, under which policy, using which data, approved by whom, and reviewed how.

That is the difference between a defensible control and a screenshot.

The Approval That Survives the Incident

The logistics company changed its approval workflow after the candidate dispute.

It did not ban the scheduling connector. The connector was useful. Recruiters needed availability data, and the AI assistant saved time in a high-volume process. The company did not require manual approval for every tool call either. That would have pushed recruiters back into spreadsheets.

Instead, it changed the approval record.

Availability lookup for interview scheduling remained automatically approved. Availability use for candidate ranking required a specific purpose tag. Any connector result that included multiple policy versions had to show the source date. Any write-back to candidate status required a separate action receipt. The approval prompt displayed whether the tool was read-only or write-capable. High-risk calls expired after the requisition closed. A weekly review sampled approval events where AI recommendations were accepted without recruiter edits.

The next candidate challenge was easier.

The company could show the agent, the recruiter, the tool, the source document, the policy version, the approval rule, the output, and the final human action. The answer was not perfect. It was specific.

That is what HR AI needs now.

The next wave of enterprise agents will not wait for HR governance to mature. Agents will enter recruiting, employee service, payroll, workforce management, learning, performance, internal mobility, and manager workflows because the workload pressure is real and the platforms are making deployment easier.

The approval layer has to move just as quickly.

An agent inventory tells the company what exists. A subprocessor chain of custody tells it where an output traveled. An evidence packet tells it what happened. A correction ledger tells it what was fixed.

The runtime tool approval ledger tells it why the agent was allowed to act in the first place.

That is the record that has to survive the incident.

The button is no longer just a button.

It is the first line of the audit trail.

This article provides a deep analysis of HR AI runtime tool approval ledgers. Published May 10, 2026.