HR AI Needs a Subprocessor Chain of Custody

The Fourth System on the Call

The subprocessor list had three names.

The incident log had four.

That was the problem.

A large retailer had been piloting an AI recruiting assistant for hourly hiring. The workflow looked ordinary on the surface. Candidates applied through the ATS. The vendor’s assistant summarized work history, checked availability against store requirements, drafted follow-up messages, and recommended which applicants should move to the next step.

One candidate challenged the result. He said the system had treated his overnight availability as a conflict even though he had selected the correct shifts. The recruiter opened the ATS and saw the vendor summary. The vendor opened its console and saw the model output. Legal asked for the subprocessor list. Procurement sent the contract exhibit.

The exhibit named a cloud provider, a foundation model provider, and an analytics vendor.

The runtime trace showed a different path. The recruiting assistant had called a scheduling-data connector through an MCP server, queried a knowledge base maintained by a separate implementation partner, used a fallback model when the primary model timed out, and wrote the recommendation back into the ATS through a service account shared across multiple workflow automations. A later email draft copied part of the same summary into Microsoft 365.

Nobody had lied in the contract. The list was just too static for the way the product actually worked.

The employer did not only need to know which companies were subprocessors in the privacy exhibit. It needed to know which system touched this candidate, at this time, for this purpose, using this credential, under this customer configuration, with this model route, producing this output, sending it to these downstream places.

That is a different artifact.

It is a subprocessor chain of custody.

The phrase sounds like legal inventory. It is not. It is an operating record for agentic HR. It connects the people who buy HR AI, the vendors who sell it, the model providers that power it, the MCP servers that extend it, the identity systems that authorize it, the workflow platforms that execute it, and the downstream systems that retain its outputs.

Without that chain, the controls built over the last month remain incomplete. A vendor remediation warranty says the vendor will help when something goes wrong. Evidence escrow says the records should exist before the incident. Decision recall says the old output must be located and withdrawn. Correction propagation says the corrected record must reach every downstream copy.

All of those controls assume one thing: the company can reconstruct the path.

In HR AI, that assumption is becoming fragile.

Why the Vendor Boundary Is Disappearing

The old HR software map was simple enough for procurement to understand. A buyer contracted with an ATS, HRIS, assessment vendor, payroll provider, background-check company, or employee-service platform. The vendor listed its subprocessors. Security reviewed SOC 2, penetration-test summaries, data processing terms, uptime promises, and breach notice language. Legal negotiated liability caps and audit rights.

That structure still matters. It no longer describes the full machine.

Modern HR AI products increasingly behave like small supply chains at runtime. A recruiting assistant may call a foundation model, a resume parser, a scheduling engine, an identity provider, a CRM integration, a document store, a messaging service, and an analytics warehouse before a recruiter sees one recommendation. A performance agent may retrieve feedback from collaboration tools, summarize manager notes, query skills data, compare job architecture, and draft a calibration packet. A payroll agent may inspect timekeeping records, tax rules, employee-service cases, and exception histories before it suggests a correction.

The buyer sees one product. The decision path may cross ten systems.

The speed of adoption explains why this issue has arrived before most contracts were ready. On April 30, 2026, ICIMS and Aptitude Research reported survey data from more than 400 U.S. talent acquisition leaders and practitioners. Sixty-nine percent of companies said they were using AI in some capacity in talent acquisition, but only 18% said they were using it broadly across hiring processes.

Candidates were moving faster: 74% of companies said candidates were using AI in the job search. Screening led employer use cases at 58%, followed by candidate communication at 54%, assessments at 50%, and sourcing at 46%. Nearly half, 46%, said they were using or planning to use agentic AI in talent acquisition.

Those numbers matter because agentic AI does not stay inside one vendor boundary. Screening touches job data, candidate data, scoring rules, resume parsing, communications, audit logs, and recruiter actions. Candidate communication touches email, SMS, CRM, scheduling, consent, language generation, and templates. Assessments touch identity verification, proctoring, simulation engines, scoring models, and accommodation records. Sourcing touches public profiles, enrichment data, email tools, CRM workflows, and sometimes external search.

The same report found that 82% of companies considered transparency and explainability important, while nearly half lacked a formal AI governance framework. That is the buying gap. Employers want trust. The workflow is already distributed.

SHRM’s 2026 HR AI research shows the governance gap from inside HR. In a survey of 1,908 HR professionals, SHRM found that 39% had AI adopted in HR functions. More than half, 56%, said they did not formally measure the success of AI investments at all.

Legal and compliance led AI governance and oversight in 37% of organizations. In states with workplace-related AI laws or regulations, 57% of HR professionals said they were not aware of those policies.

That is not only a training problem. It is a system visibility problem.

When HR does not lead the architecture, security does not know the employment context, procurement sees only the vendor contract, legal sees only the risk language, and managers see only the output, nobody owns the full chain. Everyone owns a fragment.

Agentic HR makes fragmentation more expensive. A bad output may be created by a model. It may be triggered by a workflow rule. It may be fed by stale data from a connector. It may be caused by an overbroad token. It may be shaped by a prompt template. It may be amplified by a manager packet. It may be retained by a vendor telemetry system. It may be reproduced in an analytics report.

If the employer cannot separate those layers, it cannot answer the most basic incident question.

Where did the decision come from?

The Static List and the Runtime Graph

The subprocessor exhibit was built for a slower software world.

It names third parties that process personal data for the vendor. It may include cloud hosting, customer support, analytics, email delivery, monitoring, payment systems, subprocessed professional services, and sometimes model providers. It is useful for privacy review and contract diligence. It tells the buyer which organizations may handle data.

It rarely tells the buyer what happened in one employment decision.

That distinction used to be tolerable. If a payroll platform used one cloud host and one email provider, the subprocessor list could sit in a contract folder and still be a reasonable map of the risk. If an ATS sent candidate data to a background-check provider, the integration was visible. If an assessment vendor used a proctoring partner, procurement could ask for that vendor’s documentation.

An agentic workflow is less stable. It is a runtime graph.

The graph may include:

Runtime layer	What the buyer needs to know	Why the static list is not enough
Foundation model	Which model, version, region, fallback, and routing rule produced or shaped the output	The contract may name the provider but not the model path used for a specific output
MCP server or tool connector	Which tool was invoked, under whose approval, with what schema and permissions	The subprocessor exhibit may not name every tool provider or local server
Retrieval source	Which document, policy, job profile, skills graph, or case history was retrieved	The source may sit inside the buyer’s tenant, a partner knowledge base, or a vendor index
Identity and token path	Which human, agent, service account, or delegated credential authorized the action	Privacy lists do not show token audience, privilege scope, or account sharing
Workflow platform	Which business rule, approval flow, SLA timer, and case process executed after the model output	The execution layer may be a separate platform from the AI vendor
Human review	Which recruiter, manager, HR partner, payroll analyst, or case owner saw and changed the output	A vendor log may not capture internal review behavior
Downstream copy	Where the output was written, exported, summarized, emailed, or retained	A source-system correction may not reach copied packets or analytics stores
Vendor telemetry	Which logs, prompts, evaluation traces, and support records the vendor retained	The employer may not know what survives model updates or data retention windows

The runtime graph is the thing that matters during an incident.

Suppose an AI assistant recommends that a warehouse candidate should not advance. The candidate claims the system misread a military logistics role. The employer opens the record and sees a summary. To investigate properly, the employer needs more than the vendor’s general explanation of its AI product.

It needs the input record, the prompt or instruction template, the model route, the retrieval materials, the tool calls, the scoring or ranking configuration, the recruiter action, the downstream recipients, and the vendor retention state.

It also needs the uncomfortable details: whether the output was generated by the primary model or a fallback, whether the MCP server pulled an old job requirement, whether a shared service account made the action look like it came from a recruiter, and whether a third-party enrichment provider added the data point that hurt the candidate.

The old question was: who are your subprocessors?

The new question is: which subprocessors touched this decision, in what order, with what authority, leaving what evidence?

That is why the phrase “chain of custody” fits. In legal and forensic settings, chain of custody is about preserving evidence integrity as it passes from one holder to another. HR AI needs the same discipline for decision evidence. The evidence is not only a document. It is a sequence of calls, permissions, data transformations, model outputs, human actions, and retained copies.

Without the sequence, a vendor can say “our model did not reject the candidate” while the employer says “the candidate was rejected after your tool summary,” while the implementation partner says “we only maintained the connector,” while the model provider says “we do not see customer business context,” while the ATS shows only the final status.

Everyone may be technically correct.

The candidate still has no answer.

MCP Made the Hidden Connector Visible

The Model Context Protocol did not create the problem. It made the shape of the problem easier to see.

MCP standardizes how AI systems connect to tools and data sources. The promise is straightforward: instead of writing one-off integrations for every assistant, developers can expose files, databases, APIs, workflow tools, and business systems through a common protocol. For enterprise AI, that is powerful. For HR, it is dangerous if procurement treats the protocol as plumbing rather than a decision surface.

The MCP security documents already show why. The MCP security best practices describe confused-deputy risks for proxy servers connecting clients to third-party APIs. The MCP authorization specification requires clients to use OAuth resource indicators and says access tokens must be issued for the intended MCP server. It also forbids token passthrough because a server that accepts and forwards the wrong token can bypass controls and damage auditability.

That sounds technical. In HR, it becomes a people risk.

An HR agent with the wrong token boundary can reach more data than the workflow needs. A recruiter assistant can query employee records when it should only see candidate data. A performance agent can retrieve employee-relations notes that should be excluded from a calibration packet. A payroll support agent can use a service account that hides the human requester. A local MCP server can expose files on a manager’s device. A tool description can be vague enough that the model calls the wrong tool. A connector can quietly pass data to a downstream API that the buyer never reviewed as part of the employment workflow.

This is where chain of custody becomes more than logging.

The employer needs to know:

Which MCP servers were available to the agent at the time of the output.
Which tools the agent could call and which tools it actually called.
Which OAuth resource, token audience, service account, or delegated user authorized each call.
Whether the tool was read-only, write-capable, destructive, or open-world.
Which data was returned to the model and which data was withheld.
Whether a human approved the tool call or the workflow allowed it automatically.
Which party operated the server, patched it, logged it, and retained the trace.

Static vendor review does not answer those questions.

Microsoft’s May 1, 2026 Agent 365 announcement shows how quickly this is becoming a mainstream enterprise control issue. Microsoft said Agent 365 is generally available for commercial customers and framed it as a control plane for agents across Microsoft, partner, local, SaaS, and cloud environments. It also said Defender will provide context mapping for agents starting in June 2026, including the devices they run on, the MCP servers configured for those agents, the identities associated with them, and the cloud resources those identities can reach.

That list is close to a chain-of-custody schema.

It does not mention HR specifically. It does not need to. HR work lives inside the same estate: Outlook, Teams, SharePoint, Excel, ATS exports, manager documents, payroll files, case notes, and service workflows. When an HR agent touches those systems, agent context mapping becomes employment-decision evidence.

The security team may call it blast radius. The lawyer may call it discoverability. The HR operations leader may call it trust.

They are looking at the same graph.

The Platform Race Is Becoming a Custody Race

Microsoft, Workday, and ServiceNow are not solving the same problem from the same starting point. Together, they show why the chain-of-custody layer is moving into the center of enterprise software.

Microsoft starts with identity, endpoint, productivity, cloud, and compliance. Agent 365 can see local agents, cloud agents, SaaS agents, delegated agents, and agents with their own credentials. Defender and Intune can discover unmanaged local agents. Entra can help govern identity and network controls. Purview can handle retention, eDiscovery, and data governance. Microsoft also announced registry sync with AWS Bedrock and Google Cloud connections in public preview, which moves the control plane across AI-builder platforms.

For HR, Microsoft’s advantage is that many decision artifacts escape HR systems into the productivity layer. A recruiter forwards a candidate summary. A manager downloads an interview packet. A payroll analyst works in a spreadsheet. An employee-service answer becomes an email. A performance note sits in a Teams conversation. The AI vendor may not control those copies. Microsoft may be closer to them than the HR system of record.

Workday starts from people, finance, and governed business process. Its Agent System of Record is now generally available and aims to let organizations manage agents with the same discipline they use for people, finance, and operations. Workday says more than 65 global partners are connecting their AI agents to ASOR. Through ASOR and Agent Gateway, Workday supports MCP, A2A interactions, and OpenTelemetry so customers can keep visibility into agent metrics.

Workday’s position matters because employment-impacting records often begin or end there: worker data, job architecture, compensation, performance, payroll, skills, onboarding, time, leave, and finance. If an AI agent touches those objects, the buyer will expect Workday to help identify who or what acted, on whose behalf, in which business process, with which partner agent, and under which configuration.

The hard part is not agent inventory. It is agent evidence.

ServiceNow starts from workflow execution. At Knowledge 2026, ServiceNow expanded AI Control Tower across five dimensions: discover, observe, govern, secure, and measure. It described 30 new enterprise integrations across clouds and applications including Workday, runtime observability from the Traceloop acquisition, risk frameworks aligned to NIST and the EU AI Act, least-privilege enforcement, and real-time shutdown when an agent goes beyond permissions.

The same week, ServiceNow opened Action Fabric to external agents through a generally available MCP Server spanning IT, HR, customer service, security, risk and compliance, and app development. That phrasing matters. ServiceNow is not only watching agents. It wants to become the governed action layer where agents execute work.

That could make ServiceNow a custody ledger for cross-functional HR AI incidents. A candidate dispute is not only an ATS record. It can become a case with legal, HR operations, security, the vendor owner, and the hiring manager. A payroll-agent error can become a workflow with payroll, HRIS, employee relations, finance, and the employee-service team. A performance-agent dispute can become a case with talent management, legal, the manager, and people analytics.

The platform that owns the case can ask the custody questions in order.

What system generated the output? What agent invoked the tool? What model route ran? What data was retrieved? What permission allowed it? Which business rule executed? Who reviewed it? Where was it written? Which vendor retained evidence? Which downstream systems need correction?

That is where the race is headed. The buyer does not only need an agent registry. It needs a record that survives the incident.

Regulation Is Turning Runtime Detail Into Employment Evidence

The regulatory trend is not asking employers to become protocol experts. It is asking them to prove consequential AI use.

The EU AI Act is the clearest starting point. Its Annex III lists employment, worker management, and access to self-employment as high-risk areas, including systems used to place targeted job ads, analyze and filter applications, evaluate candidates, make decisions affecting work relationships, allocate tasks, or monitor and evaluate performance and behavior. Article 12 covers record-keeping for high-risk systems. Article 86 creates a right to explanation for certain individual decisions.

Those obligations do not say “produce an MCP trace.” But in practice, an explanation of an employment-impacting AI output may require exactly that kind of trace. If the output depended on a third-party model, a tool call, a retrieval source, and a workflow rule, the employer cannot explain the decision by naming only the surface application.

California moves the issue into retention. The California Civil Rights Council’s final statement of reasons for automated-decision system employment regulations describes automated-decision system data broadly and says providers that sell or provide such systems to employers, or use them on an employer’s behalf, must retain relevant records for at least four years after the last date of use by the employer or covered entity.

Four years is a long time for product telemetry.

It is even longer for agent traces, model routes, prompt templates, connector states, and MCP server logs. If a vendor’s default retention keeps only short-lived debugging records, the employer may have a contract that satisfies privacy review and still fail the evidence test two years later.

Colorado is pushing the same direction through automated decision-making technology. The SB26-189 bill page describes a framework for consequential decisions and defines ADMT broadly as technology that processes personal data and generates outputs such as predictions, recommendations, classifications, rankings, or scores used to make, guide, or assist a decision about an individual. The bill summary says deployers would need to provide a plain-language description of a covered ADMT’s role within 30 days after an adverse consequential decision.

That requirement, if enacted in that form, would put pressure on the chain. A plain-language description is only credible if the deployer understands the machine well enough to simplify it. “Our vendor used AI” will not be enough. “The assistant queried the candidate record, retrieved the job availability rules, used a model to summarize the match, and routed the recommendation to a recruiter who accepted it” is closer. But if there was also a fallback model, an external data source, and a connector error, the description needs to capture the parts that mattered.

NIST’s Generative AI Profile gives buyers a procurement language for this. The NIST AI RMF Generative AI Profile includes suggested actions for third-party risk: inventory third-party entities with access to organizational content, maintain records of changes made by third parties, update supplier risk assessment for embedded generative AI technologies, document incidents involving third-party GAI data and systems, establish incident response plans for third-party GAI technologies, and use vendor contracts and SLAs to define incident response times and critical support.

That is close to the HR AI buyer’s checklist.

The missing translation is employment context. A third-party GAI incident in HR is not only a security incident. It can affect hiring access, pay, scheduling, performance, accommodation, leave, promotion, termination, and internal mobility. The chain of custody must preserve both technical lineage and employment consequence.

That is the part most generic AI governance tools do not yet know how to ask.

What the Chain Must Prove

A useful HR AI subprocessor chain of custody should not become a giant log dump. Logs are raw material. The buyer needs a case-usable record.

That record should answer ten questions.

Question	Evidence needed	Typical owner
What started the AI action?	User request, workflow trigger, scheduled job, case state, event ID	HR operations or platform owner
Who or what acted?	Human user, delegated agent, own-credential agent, service account, sponsor, role	Identity and security
What data was used?	Candidate, employee, job, policy, payroll, performance, case, skills, or document sources	HRIS, ATS, data owner
Which tools were available?	MCP server list, tool schemas, permissions, allowed or blocked tools, version	IT, security, vendor
Which tools were called?	Tool-call trace, arguments, response payload class, timestamp, success or failure	Vendor and platform owner
Which model path ran?	Model provider, model version, prompt template, system instruction, fallback route, region	Vendor and AI governance
What output was produced?	Summary, ranking, score, recommendation, message, action, confidence, explanation	HR workflow owner
Who reviewed or changed it?	Reviewer identity, time spent, override, reason, approval, denial, escalation	Recruiter, manager, HR partner
Where did it go?	ATS, HRIS, payroll, email, Teams, case system, warehouse, vendor telemetry	Workflow and data teams
What survived?	Retention policy, evidence escrow, legal hold, export package, deletion or archive status	Legal, privacy, vendor owner

The chain should also distinguish between four levels of evidence.

First is the system path: which applications, models, tools, APIs, and infrastructure participated. This is the part security teams usually understand.

Second is the authority path: which identity, token, permission, role, service account, approval, or delegated access allowed each step. This is where agent identity becomes employment governance.

Third is the data path: which personal data, derived data, inferred data, policy text, job requirement, performance note, or external enrichment shaped the output. This is where privacy and discrimination risk appear.

Fourth is the decision path: which human or automated workflow used the output, changed it, ignored it, routed it, or copied it. This is where HR accountability lives.

Most vendors can produce some of this. Few can produce all of it in one package.

The buyer does not need every low-risk action to generate a litigation-grade dossier. A job-description drafting assistant does not need the same chain as an AI tool that filters candidates, recommends pay corrections, scores performance, or routes employee-relations cases. The chain should scale with risk.

One workable tiering model looks like this:

Risk tier	Example workflow	Chain-of-custody requirement
Tier 1: consequential decision support	Candidate ranking, promotion packet, termination risk, pay correction, schedule allocation	Full runtime chain, model route, tool calls, human review, downstream destinations, retention
Tier 2: active workflow execution	Interview scheduling, employee-service case routing, payroll exception triage, onboarding task assignment	Agent identity, trigger, tool calls, data classes, action receipts, reviewer or owner
Tier 3: advisory content	Job description draft, policy summary, learning recommendation, manager coaching prompt	Model route, prompt template category, source documents, output owner, retention class
Tier 4: low-risk productivity	Formatting, translation draft, meeting summary with no employment action	Basic usage log, data classification, deletion or retention policy

The point is not to create paperwork for its own sake.

The point is to avoid false precision. “AI was used” is too vague. “The vendor’s model made the decision” is often too simple. “A human reviewed it” is not enough if the human saw a polluted summary. “The subprocessor list names the model provider” is not enough if a different fallback model ran. “The data stayed in the customer tenant” is not enough if the agent called a local connector or external tool that returned sensitive material.

The chain-of-custody record should make those claims testable.

How Contracts Have to Change

The next HR AI procurement fight will be less about whether vendors use AI and more about whether they can prove the custody chain when AI is used.

The standard subprocessor clause will not disappear. It should become only the first layer. Buyers will need runtime obligations that sit beside privacy, security, audit, indemnity, and support terms.

Start with the event boundary. For high-risk HR workflows, a custody event should include model calls, fallback routes, retrieval calls, MCP tool invocations, write actions, human approvals, overrides, downstream sends, and vendor telemetry retention. The contract should also define which events are logged by default, which require customer configuration, which require paid retention, and which are unavailable.

Then separate a subprocessor from a runtime participant. A cloud host may be a subprocessor for the whole product. A model provider may be a runtime participant only for certain tasks. An MCP server may be operated by the customer, the vendor, a marketplace partner, or an implementation consultant. A workflow action may run in ServiceNow, Workday, Microsoft, the ATS, or the vendor’s own platform. A chain-of-custody clause should force the parties to name these roles.

For employment-impacting workflows, the buyer also needs a tool and model registry. It should know which models, MCP servers, connectors, data sources, and agent identities are approved for candidate, employee, payroll, performance, and employee-relations use. New runtime participants should require notice, review, or at least risk-tiered approval.

Evidence export needs its own clock. A vendor should be able to produce a case packet for a disputed output within a defined time window. For high-impact hiring, pay, performance, scheduling, or employee-service cases, waiting weeks for engineering to assemble logs will not work. The packet should be understandable to legal, HR, security, and the business owner, not only to the vendor’s engineering team.

Negative evidence matters too. Sometimes the important fact is that a system did not touch the record. A vendor should be able to show that no external model was used, no fallback route fired, no enrichment source was called, no write action occurred, or no downstream export happened. Absence should be provable.

Finally, the contract should connect custody to remediation. If a vendor cannot produce the chain for a high-risk workflow, the warranty should not treat that as a minor support defect. It should trigger enhanced cooperation, cost sharing, remediation support, and, for repeated failures, termination or suspension rights. A missing custody chain can make the employer unable to answer a candidate, employee, auditor, regulator, or court.

That is a business risk.

The strongest vendors will not resist this forever. Chain of custody can become a sales advantage. A vendor that can show exactly which model, tool, permission, data source, and workflow action produced an output will be easier for legal and procurement to approve than a vendor that can only provide a SOC 2 report and a responsible AI statement.

The weaker vendors will describe the demand as unreasonable. Some will say the model provider does not expose the right trace. Some will say prompt logs create privacy risk. Some will say MCP servers are customer-operated. Some will say retention is too expensive. Some will say the workflow crosses systems they do not control.

Those may be valid constraints. They are not excuses.

They are the reason the buyer needs the chain before deployment.

The Receipt Before the Dispute

The retailer eventually reconstructed enough of the candidate incident to reopen the application. The scheduling connector had returned an old store rule. The fallback model had compressed the availability note too aggressively. The recruiter had accepted the summary because it looked routine. The ATS status was corrected. The candidate received another look.

The company fixed the case.

It did not fix the architecture.

The next week, procurement added a question to the vendor renewal checklist. It was not about accuracy. It was not about uptime. It was not even about the subprocessor list.

For any AI output used in hiring, pay, performance, scheduling, or employee service, can you produce a runtime chain of custody?

The vendor asked what that meant.

The buyer wrote it down: model, prompt, tool, MCP server, identity, token boundary, data source, workflow action, human reviewer, downstream destination, retention record, and export time.

That is the new receipt.

HR AI does not need fewer subprocessors. It will probably have more. It needs a way to prove how they moved through a decision before the decision is challenged.

The static list belongs in the contract.

The chain belongs in the record.

This article provides a deep analysis of HR AI subprocessor chain of custody. Published May 10, 2026.