Finance Wants the HR AI Scorecard Before Renewal

On May 21, 2026, Workday gave HR buyers a number that could fit neatly into a renewal deck.

Its Recruiting Agent had supported 14 million hiring processes in the quarter, up 44% year over year, the company said in its fiscal 2027 first-quarter results. Workday also said more than 4,000 customers had started using at least one Workday Illuminate agent. For a product leader, those numbers show adoption. For a sales team, they show momentum. For a CHRO defending next year’s budget, they look like evidence.

A finance partner will ask for a different file.

How many recruiter hours came back? Which hiring decisions improved? Did manager rework fall? Did candidate disputes rise? Did time-to-fill move because the agent helped, because requisitions changed, or because the market softened? How much did the company spend on integration, review, employee communication, legal evidence, vendor support, and retraining? If a renewal expands from one HR agent to a platform bundle, which operating result justifies the step-up?

Those questions have become harder because HR AI is no longer a small experiment in a side workflow. On May 28, Workday and Google Cloud expanded their partnership so Workday’s Sana Self-Service Agent could sit inside Gemini Enterprise, with HR and finance tasks moving into the daily workspace. On June 3, Microsoft said Infosys, TCS, and Wipro were scaling Microsoft 365 Copilot to more than 300,000 employees within six months, with Wipro citing an appraisal agent that reduced performance review effort by nearly 70%. ServiceNow, SAP, Oracle, ADP, and others are also putting agents near payroll, talent, employee service, recruiting, workforce planning, and manager work.

The product story is moving quickly. The measurement story is not.

SHRM’s 2026 State of AI in HR report found that 39% of organizations had adopted AI in the HR function and 62% had adopted AI somewhere in the organization or planned to do so soon. Yet only 16% of HR professionals said they used their own ROI metric to assess AI success. Fifty-six percent did not formally measure AI investment success at all.

Picture the renewal review 90 days before the contract date. HR brings the Workday agent report, a Copilot adoption slide, a ServiceNow control screenshot, and a list of managers who say drafts arrive faster. Finance brings the renewal increase, the consulting invoice, the integration spend, and the headcount plan. Legal asks whether an employee could challenge an AI-assisted decision. The HR analytics lead has to translate all of it into one answer: what changed in the workflow after the agent arrived?

That gap is not a research footnote. It is the renewal problem.

The scorecard has to be built before procurement starts negotiating. Once the renewal package is already on the table, usage screenshots become the easiest proof and hidden rework becomes a late objection. HR needs the harder record early: baseline workflow, post-agent workflow, human review load, correction cost, risk evidence, trust signal, and budget owner.

Over the past month, this site has followed HR AI’s movement from front-door tools to agent passports, fraud desks, work redesign, layoff planning, and human-agent capacity models. Today’s issue sits at the renewal checkpoint. HR has bought or inherited AI. The CFO will not accept vendor usage screenshots as proof forever.

A useful renewal review starts after use: what changed, for whom, and at what hidden cost?

May 21 Put Usage in the Renewal File

Workday’s fiscal 2027 first-quarter release matters because it showed HR AI crossing from product demo into operating volume.

The company reported subscription revenue of $2.06 billion, up 13.4% year over year, and said more than 80 million users were under contract. It pointed to AI demand through Workday Illuminate and said more than 4,000 customers were using at least one agent. Recruiting Agent’s 14 million hiring processes gave buyers a simple adoption proof point.

That number is powerful. It is also incomplete.

A hiring process is a unit of activity. It is not automatically a unit of business value. It can show that recruiters, candidates, managers, or workflows touched an agent. It cannot by itself show whether a company improved hiring quality, lowered recruiter rework, reduced bias exposure, shortened hiring-manager loops, or saved enough money to justify a larger contract.

This distinction will matter in renewal season.

The first AI cycle inside HR was often justified by capacity pressure. Recruiters faced more applications. HR service teams faced more employee questions. Managers had too many review packets, survey comments, goals, learning plans, and scheduling exceptions. Employee-service portals were clumsy. Payroll and benefits teams were under pressure to answer questions faster. AI promised relief.

Relief is measurable only if the company defines the unit.

For recruiting, the unit cannot be only applications processed or interviews scheduled. It should include qualified slate quality, hiring-manager acceptance, candidate withdrawal, fraud review, appeal volume, source quality, recruiter review time, and downstream performance signals. For employee service, the unit cannot be only case deflection. It should include first-contact resolution, reopened cases, sensitive escalations, pay corrections, employee satisfaction, policy accuracy, and manager time. For performance management, the unit cannot be only drafts generated. It should include manager editing time, employee objections, calibration changes, bias review, pay-impact disputes, and documentation quality.

Without those units, adoption can hide work.

An agent can process more items and still leave humans with harder exceptions. It can draft faster and still require more review. It can answer routine questions and still create trust problems when employees cannot tell whether the answer came from current policy or stale data. It can reduce one team’s workload and increase another team’s audit burden.

This is why a CFO should not ask only for a vendor dashboard. A CFO should ask HR to map the workflow before and after the agent, then show the net operating change.

The basic renewal file should have at least four rows.

Renewal row	What finance should ask HR to prove
Activity	How many cases, hiring steps, drafts, approvals, or employee questions touched the agent?
Operating outcome	Which time, cost, quality, risk, or trust metric changed after deployment?
Human cost	What review, exception, correction, training, and manager time did the workflow require?
Evidence	Which records show the before-and-after result, and who owns them?

Most companies already have fragments of this evidence. ATS data can show process time, source mix, candidate status, interview steps, and offer conversion. HR case systems can show deflection, escalation, reopen rates, and resolution time. Payroll and HRIS systems can show corrections. Learning systems can show completion and skill moves. Employee surveys can show trust. Finance can show license, services, consulting, and integration spend.

Those fragments often live in different rooms.

HR owns the workflow context. IT owns parts of the tool stack. Finance owns budget and renewal pressure. Legal owns risk language. Security owns access and audit posture. Managers own the final human explanation. Vendors own product telemetry. A renewal scorecard has to connect them before the negotiation begins.

If HR waits until the contract is on the table, usage will be the easiest thing to show. That is exactly why it can become the wrong thing to show.

SHRM Found the Missing Metric

SHRM’s State of AI in HR 2026 makes the measurement gap visible.

The survey covered 1,908 HR professionals. It found that HR adoption was real but uneven. Thirty-nine percent of organizations had adopted AI in the HR function. Another 7% intended to launch AI in HR during the year. AI use was most common in recruiting at 27%, HR technology at 21%, learning and development at 17%, and employee experience at 14%. Across the organization, 62% had adopted AI somewhere or intended to do so soon.

Then the report gets uncomfortable.

Only 16% of HR professionals said they used their own ROI metric to assess AI success. Fifty-six percent said they did not formally measure AI investment success. In the same report, HR professionals pointed to accuracy, privacy, ethics, job displacement, and lack of transparency as major concerns.

That combination defines the scorecard problem.

HR is close enough to the work to see the risks. It is not always equipped to prove the return. Finance can see the spend. It is not always close enough to interpret the work. Vendors can see usage. They are not accountable for every downstream employee decision, manager conversation, or payroll correction that follows.

No single ROI percentage can carry the whole case.

HR AI touches too many different workflows for that. A recruiting agent, a payroll assistant, a performance-review drafting tool, an employee-service agent, a learning recommendation engine, and a scheduling optimizer do not create value in the same way. They do not create risk in the same way either.

The useful answer is a disciplined measurement model.

It should separate five categories that often get blended together:

Category	Good metric examples	Bad proxy
Productivity	Cycle time, review minutes, case handling capacity, manager editing time	Logins
Quality	Error rate, reopened cases, slate quality, corrected decisions, calibration changes	Outputs generated
Trust	Employee satisfaction, candidate withdrawal, appeal volume, explanation requests	Chat satisfaction alone
Risk reduction	Bias-audit readiness, record completeness, access discipline, policy accuracy	Security badge in vendor deck
Economic return	Net savings after licenses, services, integration, rework, training, and exception staffing	Gross headcount avoided

The bad proxies are tempting because they are easy. They also favor the seller.

Usage can rise because the interface is mandatory. Messages can rise because employees are confused. Case deflection can rise because employees give up. Drafts can rise because managers are told to use the tool. Automation rate can rise while rework moves to HRBPs, payroll, employee relations, or legal. A product can look more valuable precisely because it created more machine-readable activity.

That distinction matters: measurement should cover the right side of the transaction.

If a payroll assistant answers routine questions correctly, employees get time back and HR service capacity improves. If the same assistant gives confusing answers about withholding, corrections may become more expensive. If an AI recruiting screen reduces first-pass work while increasing candidate appeals, the net result depends on review capacity and dispute cost. If a performance-review tool reduces writing time while making employees less confident in the process, HR has to price the trust loss.

SHRM’s 56% figure matters because it shows many HR teams are still in between these models. They are no longer in a pre-AI world. They have not yet built the measurement discipline that renewal and regulation will demand.

The CFO’s role is to force that discipline without flattening HR work into one savings line.

The CHRO’s role is to defend the messy parts of value: better decisions, lower risk, fairer explanation, improved employee experience, faster service, and manager capacity. Those can be measured. They just cannot be measured only through the vendor console.

Usage Dashboards Are Not a Business Case

The enterprise AI market is training buyers to look at dashboards.

ServiceNow’s AI Control Tower expansion used a CFO-friendly set of verbs: discover, observe, govern, secure, and measure. The company described visibility across AI deployed in the enterprise, cost tracking, ROI dashboards, least-privilege enforcement, and shutdown controls. For CIOs and finance leaders, this is a useful response to agent sprawl.

Microsoft’s June 3 announcement about Infosys, TCS, and Wipro shows the same dashboard pressure at workforce scale. Microsoft said the three Indian IT services firms would scale Microsoft 365 Copilot to more than 300,000 employees within six months. Infosys CEO Salil Parekh framed Copilot as part of enterprise AI transformation. TCS CEO K Krithivasan emphasized scale across a large workforce. Wipro CEO Srini Pallia connected Copilot to productivity and decision-making. Microsoft also cited Wipro’s agent for performance appraisals, which Wipro said reduced performance-review effort by nearly 70%.

The Wipro numbers are unusually useful because they show both the strength and the limit of activity proof. Microsoft said Wipro had reached more than 95% monthly active usage of Copilot, generated 7.5 million prompts each month, averaged 23 actions per user per week, saved more than 250,000 FTE days every quarter, and had more than 29,000 end-user-developed agents. Those metrics show scale. HR still has to ask what changed in appraisal quality, manager coaching time, employee confidence, calibration accuracy, and dispute volume after the appraisal agent entered the process.

These are not small pilots. They are enterprise seat rollouts.

They also show why HR needs a stronger scorecard.

An IT services company can point to seat deployment, adoption, developer productivity, document drafting, appraisals, and internal process acceleration. HR leaders inside other enterprises will face similar pressure. If Copilot, Gemini Enterprise, Workday agents, ServiceNow control layers, SAP Joule assistants, Oracle role agents, or ADP Assist become part of daily work, HR will inherit a stream of activity metrics.

Some will be useful. None will be enough.

Vendor telemetry is not false. It is local. It sees what the product can see. A Microsoft usage report can see Copilot interactions. A Workday agent can see Workday-related activity. A ServiceNow control layer can see agents, workflows, costs, and governance signals across connected systems. SAP, Oracle, ADP, Greenhouse, iCIMS, UKG, and other HR technology providers can see their own flows.

HR outcomes cross those boundaries.

A recruiting decision may begin in a sourcing tool, move into an ATS, touch a screening agent, generate a manager summary in a workspace tool, create an interview note, trigger a background-check vendor, and eventually show up in payroll, onboarding, and performance data. An employee-service issue may start as a chat, become a case, require policy interpretation, touch payroll, and end with a manager conversation. A performance-review workflow may include collaboration documents, AI summaries, Workday or SuccessFactors records, compensation calibration, employee comments, and legal retention.

No single product dashboard owns the whole result.

That is why HR should build a measurement layer around workflows, not vendors. The scorecard should begin with the employee or manager job to be done, then attach the systems and evidence. A vendor can contribute metrics. It should not define success alone.

A practical example is performance review automation.

A tool can report that it generated 2,000 draft summaries and saved an estimated number of hours. HR should ask for the follow-up rows: how much did managers edit the drafts, how many summaries required HRBP intervention, how many employees challenged language, whether calibration changed, whether pay equity review flagged issues, whether review completion came earlier, whether employees trusted the process more or less, and whether managers used the freed time for coaching.

The appraisal draft is only a step.

The business case depends on whether the performance process became more accurate, timely, fair, and useful. If the tool saves writing time but increases appeals or weakens trust, the renewal conversation changes. If it saves writing time and improves coaching quality with no increase in disputes, the renewal file gets stronger.

The same logic applies to employee service.

An agent can show 60% case deflection. HR should ask what happened to the 40% that escalated, how many answers were reopened, which policy areas created confusion, whether payroll corrections rose or fell, whether employees were more satisfied, and whether HR specialists were spending less time on routine work or more time on hard cases. A deflected case can be a success. It can also be a silent failure if the employee stops asking.

Dashboards show movement. Scorecards show consequence.

Workflows Deserve Their Own P&L

A serious HR AI budget should not start with the tool list. It should start with a workflow P&L.

Finance already understands P&L discipline. Revenue, cost, margin, and variance tell a story about a business line. HR AI requires a smaller version at workflow level. It should name the work, the baseline, the automation cost, the human cost, the risk cost, and the outcome.

This would change how HR talks about AI.

Instead of saying, “We deployed an employee-service agent,” HR could say: “We deployed the agent to benefits and payroll questions in North America. Baseline was 18,000 monthly cases, 42% first-contact resolution, 11% reopen rate, and a 3.8 employee satisfaction score. After deployment, routine policy cases fell 35%, reopened payroll cases rose 2 points, specialist time moved from routine answers to corrections, and manager questions rose in two states after a tax-policy update. Net savings were positive only after the third month because training and correction work were higher than forecast.”

That kind of sentence is harder to write. It is also the sentence finance needs.

The same model can work for recruiting:

Workflow P&L item	Recruiting example
Baseline	Applications per recruiter, hiring-manager response time, slate acceptance, candidate withdrawal
AI cost	Vendor fee, integration work, model or credit usage, evidence export support
Human cost	Recruiter review, fraud review, candidate communication, appeal handling
Outcome	Time-to-fill, slate quality, offer acceptance, early attrition, dispute rate
Risk	Adverse-impact review, notice compliance, decision records, vendor audit support

It can work for performance management:

Workflow P&L item	Performance-review example
Baseline	Manager writing time, completion delays, calibration changes, employee objections
AI cost	Assistant license, workspace usage, HRIS integration, model governance
Human cost	Manager review, HRBP review, employee explanation, edits, corrections
Outcome	Review completion, coaching quality, pay calibration accuracy, employee trust
Risk	Bias review, retention, explanation records, legal discovery support

The workflow P&L exposes a key fact: AI value often depends on the human work that remains.

The agent may reduce search and drafting time. The company may still need senior humans to review sensitive cases, explain decisions, and handle exceptions. That is not a failure. It is the operating design. Failure happens when HR sells the tool as replacement capacity while the workflow still depends on hidden human work.

Gartner’s May 2026 warning about autonomous business and AI layoffs fits here. Gartner said AI layoffs may create budget room but do not deliver returns by themselves. A workflow P&L forces the company to test that warning before another cut or renewal.

If a company cuts HR service capacity because an agent can answer employee questions, the P&L should show whether payroll corrections, employee relations cases, or specialist escalations rise. If it cuts recruiting coordinators because AI can schedule and summarize, the P&L should show whether recruiters or hiring managers absorb more exception handling. If it reduces L&D support because an AI tutor can recommend courses, the P&L should show whether skills actually move into roles that need them.

A workflow P&L also gives vendors a fairer test.

Good AI products should not be punished because old HR processes were poorly measured. If a vendor really reduces cycle time, improves quality, lowers risk, or gives managers useful capacity, the scorecard should make that visible. It can also separate vendor value from internal change-management failure. A tool may be sound while the rollout lacks training, data cleanup, or manager time.

That distinction matters in negotiation.

A buyer with no scorecard can only argue from budget pain or dissatisfaction. A buyer with a workflow P&L can say where the product works, where it fails, what support is missing, which metrics justify renewal, and which modules should be paused. The vendor can respond with evidence, services, price changes, product fixes, or a narrower scope.

The renewal becomes a business review instead of a usage review.

The Manager Rework Line

Every HR AI scorecard requires a manager rework line.

Microsoft’s 2026 Work Trend Index argued that AI impact depends heavily on organizational factors, manager support, culture, and work measurement. The report drew on 20,000 AI users across ten markets and more than 100,000 Microsoft 365 Copilot chats. Its practical implication for HR is direct: AI value can stall if managers become the unmeasured review layer.

Managers are where many HR AI promises either land or break.

A recruiter can receive an AI-generated candidate summary, but the hiring manager still decides whether the slate is credible. A performance assistant can draft review language, but the manager still owns the conversation. A scheduling optimizer can suggest coverage, but the frontline manager deals with callouts, worker preferences, overtime, and employee complaints. A learning system can suggest skills, but the manager decides whether the employee gets stretch work. An employee-service agent can answer a policy question, but the manager often becomes the first human asked to interpret it.

This creates rework that most AI business cases undercount.

Manager rework includes checking facts, rewriting drafts, explaining recommendations, documenting overrides, reopening cases, resolving disputes, and translating AI output into a human decision. Some of that work existed before AI. Some increases because the agent moves more work into a manager’s queue.

Measure it.

Not perfectly. Enough to manage.

Start with high-volume, high-risk workflows. For each, HR can sample manager review time before and after deployment. It can track how often managers edit AI-generated drafts, reject recommendations, request HRBP help, reopen cases, or escalate employee questions. It can survey managers on whether AI reduced work, changed work, or moved work from specialists to them. It can compare teams that received training with teams that did not.

Call it capacity planning, not bureaucracy.

A company that claims AI saved 10,000 manager hours should know where those hours went. Did managers coach more? Did they reduce after-hours work? Did employee response times improve? Did they spend the time checking AI output? Did they lose time explaining decisions employees did not trust? Did HR centralize some review work to protect managers?

The answer changes the business case.

If AI reduces manager writing time and improves employee conversations, the renewal file is strong. If AI reduces writing time but increases review anxiety and employee disputes, HR needs a different rollout model. If managers ignore the output because they do not trust it, adoption is not value. If managers approve output without review because they are overloaded, adoption becomes risk.

This is also where reskilling becomes real.

Many AI transformation plans say employees and managers will move toward judgment work. That is plausible. It is not automatic. Judgment work requires time, training, context, authority, and clear escalation paths. A manager cannot be asked to supervise agent output, explain decisions, protect employee trust, and maintain productivity without a capacity plan.

The rework line should sit next to the license line.

Finance should see both. A tool that costs $1 million and creates $2 million in verified labor value may deserve renewal. A tool that costs $1 million and moves $1.5 million of unbudgeted review work to managers has a weaker case. A tool that reduces risk without large time savings may still be worth buying, but HR should make that argument clearly.

The point is to stop treating manager time as free.

It is not.

Regulators Measure Evidence, Not the Demo

The HR AI scorecard is also becoming a compliance artifact.

Employment AI sits in a different risk class from general office productivity. Recruiting, promotion, performance, compensation, scheduling, internal mobility, and employee-service decisions can affect jobs, pay, opportunity, working conditions, and legal rights. Regulators do not care that an agent was popular in the dashboard if the company cannot explain an affected decision.

Colorado’s 2026 automated decision-making framework, California’s employment automated-decision rules, New York City’s Local Law 144, Illinois AI employment notice rules, and the EU AI Act all push employers toward notice, recordkeeping, bias review, human review, explanation, or risk controls in employment-related AI. The details differ. The direction is consistent.

HR will need records that connect activity to decisions.

A renewal scorecard should therefore include evidence readiness. That means HR should ask whether each AI workflow can produce the records needed for an employee complaint, candidate dispute, pay correction, audit, discovery request, or regulator inquiry. It should know where prompts, model versions, configuration, source data, reviewer logs, human overrides, notices, and final decisions live.

Evidence quality does more than satisfy legal hygiene.

Evidence quality affects ROI. A tool that cannot export useful records may save time in the front office and create cost in legal, employee relations, or compliance. A tool that reduces routine work but leaves HR unable to explain an adverse employment outcome may become more expensive after the first dispute.

The scorecard should separate three kinds of evidence:

Evidence type	Purpose
Operating evidence	Shows whether the workflow improved time, cost, quality, and rework
Decision evidence	Shows how an employment-related recommendation or action was made and reviewed
Control evidence	Shows access, retention, model changes, vendor support, and shutdown or correction paths

Most renewal discussions overemphasize operating evidence. That is understandable because it supports the budget. But decision and control evidence may determine whether the product can stay in high-risk HR workflows.

ServiceNow’s control language, Workday’s agent system of record positioning, Microsoft Agent 365 governance, Oracle’s role-based HCM agents, and SAP’s Joule assistants all point toward this new buyer expectation. The market is not only asking agents to do more. It is asking them to leave a usable trail.

That trail should feed the scorecard.

If a recruiting agent supports 14 million hiring processes, a buyer should ask how many of those processes produced reviewable decision evidence, how long records are retained, how audit exports work, which vendor support rights are included, how human review is logged, and whether records can be linked to outcome metrics. If an employee-service agent answers payroll questions, the buyer should ask how incorrect answers are detected, corrected, and reported. If a performance tool drafts review language, the buyer should ask how edits, overrides, and employee responses are retained.

The renewal file should make a simple distinction.

Low-risk productivity workflows can be measured mainly through time, usage, quality, and satisfaction. Employment decision workflows need an evidence layer as well. A tool can be valuable in one tier and not ready for the other.

That distinction gives HR more room to deploy AI responsibly. It can expand safe use cases while slowing or redesigning workflows that affect pay, promotion, scheduling, hiring, or termination. It can ask vendors for stronger evidence support without rejecting every assistant. It can also defend AI investments that reduce risk, even when direct labor savings are smaller.

Risk reduction is a return. HR just has to measure it.

Renewal Reviews Need a Harder Sheet

The better HR AI renewal review should start 90 days before the contract date.

Not with a vendor pitch. With an internal evidence pull.

For each AI-enabled workflow, HR, finance, IT, legal, security, and the business owner should sit with the same sheet. The sheet should show baseline performance, product usage, workflow outcomes, human review cost, error and correction data, employee or candidate trust signals, evidence readiness, and vendor support performance. It should also show which metrics are missing.

The missing rows matter.

If HR cannot measure manager rework, the subsequent rollout should include sampling. If it cannot connect agent usage to outcome quality, the data model needs work before expansion. If it cannot export decision records, the workflow may need a lower-risk scope. If it cannot price correction work, finance should not accept gross savings. If employees do not know how to challenge an AI-assisted decision, adoption is incomplete.

The renewal decision can then split into four choices:

Decision	When it makes sense
Expand	The workflow shows net value, manageable human cost, trust, and evidence readiness
Renew narrowly	Some workflows work, but others need redesign or stronger vendor support
Pause	Usage is high but outcome, risk, or rework evidence is weak
Exit	The product creates more cost, risk, or trust damage than value

This makes the conversation harder for everyone.

Vendors lose the comfort of activity metrics alone. HR loses the comfort of saying AI is strategic without proving which work changed. Finance loses the comfort of treating headcount reduction as the whole return. IT loses the comfort of measuring deployment without workflow impact. Managers lose the ability to absorb invisible review work without naming it.

That is a better kind of discomfort.

It turns HR AI from a belief system into an operating review.

The coming year will produce more agent announcements. Workday will keep embedding agents into HR and finance work. Microsoft will keep pushing Copilot and agent infrastructure across large workforces. ServiceNow will keep selling control and measurement across systems. SAP, Oracle, ADP, UKG, Greenhouse, iCIMS, and others will keep moving AI closer to employee, manager, recruiter, payroll, and talent workflows. Each vendor will bring metrics.

HR should bring its own.

The most useful scorecard will not be elegant. It will be specific. It will name the workflow, the baseline, the human work left behind, the operating result, the trust signal, the risk evidence, and the economics after rework. It will let a CHRO tell the CFO where AI deserves more money and where it does not.

That is the renewal discipline HR needs now.

A year ago, many buyers were still asking whether employees would use HR AI. By mid-2026, the answer is increasingly yes. Employees, managers, recruiters, and HR teams will use it when it is placed inside the tools where work already happens.

The harder question arrives after the usage chart.

When finance asks what changed, HR needs more than a screenshot. It needs a scorecard that can survive a renewal meeting, a manager review, an employee dispute, and an audit file.

The contract may renew on usage. The operating model cannot.

This article provides a deep analysis of the HR AI measurement gap, renewal scorecards, and the operating metrics CHROs and CFOs need before expanding AI agents across HR workflows. Published June 13, 2026.