Every AI agent vendor has a case study. Here is what they leave out.

By Moe Chizari / May 14, 2026 / AI & Automation

Klarna’s AI customer service agent was the most cited success story in enterprise AI. It was reported to have replaced the equivalent of around 850 staff, was credited with around $60 million in savings, and became the case study every vendor pulled out to prove their model worked. Then Klarna quietly began hiring humans back. The most famous AI agent case study in the world ended in a public reversal, and once you start looking, the rest of the case studies are not as solid as the press releases suggest either.

That matters, because the AI agent pitch is now the default closing slide in every enterprise software meeting. The buyer needs a way to read the case studies that arrive in those meetings, because the gap between the headline number and the operating reality is where Australian mid-market AI projects are about to spend money they will not recover.

What actually happened at Klarna

Klarna deployed a customer service agent built on OpenAI in 2024. The early figures were striking. The agent was handling the workload equivalent of a large support function, reducing average resolution time, and apparently delivering at a fraction of the cost of human agents. The CEO became one of the loudest public advocates for full AI customer service replacement, and the case study circulated globally for the next eighteen months.

By 2025 the story had shifted. Klarna reversed its AI-only customer service strategy because complex and emotionally charged queries required human judgment the AI agent could not reliably supply. The agent had handled volume, but the cases that mattered most, the ones where a customer was angry or confused or in financial distress, were the ones it was getting wrong. Klarna started hiring people back. The most-cited proof of concept for AI agent replacement of a human function turned out to require a human function alongside it.

This is not a story about Klarna failing. Klarna ran a real experiment at scale and learned something genuine, which is more than most of the companies citing them have done. The point is what the case study collapses into when you read it from the buyer’s seat. The headline says an agent replaced 850 people. The operating reality is that it handled the easy 80% of the volume and the difficult 20% needed humans, governance, and judgment to survive contact with real customers.

The four shapes of agent that look easy to buy

Once you start reading vendor case studies through Klarna’s footnote, a pattern emerges. The agents that read as low-effort to buy fit one of four shapes, and it is worth recognising which one is in the room.

Single-vendor agents on single-vendor data. Salesforce Agentforce works inside Salesforce, on Salesforce data, with Salesforce’s security model. HubSpot’s agents work inside HubSpot. Microsoft Copilot works inside Microsoft 365. These are real, they work, and they are by far the easiest agents to buy safely. They work because the data and security work was done years ago when you implemented the underlying platform. The agent inherits all of it. You just do not see the bill again.

Read-only or research agents. A research agent that summarises documents, an internal search tool that retrieves from a knowledge base, an agent that drafts a customer email for a human to send. These are safe by scope rather than safe by design. The blast radius is small because the agent cannot actually do anything destructive. If the answer is wrong, the human catches it before it reaches the customer or the database.

Narrow vertical agents with hidden implementation work. The customer service agents that handle Tier 1 support for a smart sprinkler company, a SaaS onboarding flow, an FAQ-heavy support function. These look like packaged products in the case study, but the work that makes them function lives in a knowledge base someone spent months curating. The vendor’s marketing does not show that work. Industry data from 2026 finds that 62% of underperforming AI customer service projects fail because of insufficient data preparation, not platform selection. The case studies that worked are the ones where the data work was done.

Built in-house by very large teams. JPMorgan running hundreds of agentic AI use cases in production, Starling Bank’s in-app assistant, Amazon and Uber’s internal agent systems. These appear in case study lists alongside the smaller examples, but they are not products you can buy. They are products that very large companies built with engineering, risk, compliance and product teams that no mid-market business can replicate.

The pattern that runs through all four is that the data and security work always happens somewhere. It is either done inside the vendor’s platform, done by the customer before deployment, designed around by limiting the agent’s scope, or done by a large in-house team. The case studies that read as “easy to buy” are not actually easy. They are easy in the dimension the case study chose to highlight.

Where the work goes when you do not see it

For an Australian mid-market business taking a vendor pitch, the practical question is not whether an agent can work safely. Agents work safely all the time. The practical question is who is going to do the work that makes yours work safely, and whether you knew you were paying for it.

The work, when you list it out, is consistent across every successful agent deployment we have seen. The knowledge base or data layer the agent draws on has to be clean, current, and permissioned. Someone has to decide what actions the agent can take autonomously and what requires a human in the loop. Someone has to write the policy for what happens when the agent makes a wrong decision, and the audit trail to prove what it did. Someone has to handle the integration into the systems the agent needs to read from and write to, and the credentials and access controls that govern that. Someone has to monitor the agent in production and roll back fast when it drifts. Someone has to retrain staff on what their job looks like now that an agent is doing part of it.

None of that is glamorous. None of it shows up in a case study. All of it is the actual cost of an AI agent that does not become your next Klarna. If your vendor is not telling you who is doing this work, you are doing it. If you are not doing it, no-one is, and the agent is running on optimism.

What this means for Australian mid-market

The same dynamic that excludes Australian mid-market from the Anthropic and OpenAI Forward Deployed Engineer ventures excludes you from the easiest version of the agent case studies. The vendor agents that work cheaply work because they are tightly scoped to a platform you already own. The vendor agents that promise to do everything work because there is a large in-house team behind them that you do not have. Anything in between needs operating work to make it safe, and that operating work is the thing that does not appear in the brochure.

The reasonable buying posture for Australian mid-market is to assume that any agent pitch you take has a hidden cost line for governance, data preparation, workflow redesign, training and monitoring, and to ask the vendor to put it on the page. If they cannot quantify it, they have not done it before. If they say it is included, ask them what specifically, and who at their end will own it after the engagement closes. The answer is rarely as clean as the case study.

This is the same argument we made about the diffusion gap. Adoption is the easy bit. Diffusion, the actual change in how work gets done, is governed by people, process and accountability work that does not have a sales line item. Agents are the sharp version of the same problem. The headlines tell you what the agent does. The footnotes tell you what the agent costs, and the case studies are mostly headline.

What you should do now

Read every case study from the buyer’s seat. When a vendor presents an agent case study, ask three questions before anything else. Who did the data preparation, and how long did it take. What does the agent escalate to a human, and what happens when it does not. What did the customer’s first six months of production operation look like, including the things that went wrong. If the vendor cannot answer those, the case study is marketing, not evidence.

Match the agent to its safest shape. If you are buying your first agent, buy one of the four shapes that genuinely work cheaply. A single-vendor agent on data you already trust. A read-only research or summarisation agent with a human approving every output. A narrow Tier 1 customer service agent with a clean handover to a human. Skip the cross-system autonomous workflow agent until you have governance, change management and an audit trail working on the easier ones.

Get the operating layer in place before you commit. The agents that survive contact with real customers do so because the operating work was done first. Acceptable use, escalation rules, audit trail, monitoring, training, vendor management. Without those, the case study you write at the end of the year will be Klarna’s, not the one you wanted. Our managed AI and AI governance work is built around exactly this, and a free AI readiness review will tell you which of the four agent shapes is actually right for your business this quarter.

Frequently asked questions

Why did Klarna reverse its AI customer service agent?

Klarna reversed its AI-only customer service strategy because complex and emotionally charged customer queries required human judgment that the AI agent could not reliably supply. The agent was handling volume effectively, but the cases that mattered most, particularly those involving angry, confused or financially distressed customers, were the ones it was getting wrong. Klarna began hiring human staff back to handle those cases alongside the agent.

Are AI agents safe for Australian mid-market businesses?

Some shapes of AI agent are safe to deploy quickly. Single-vendor agents on single-vendor data, read-only research and summarisation agents, and narrow Tier 1 customer service agents with clean human escalation are all reasonable first moves. Cross-system autonomous workflow agents are not, until the operating layer of governance, audit, monitoring and training is in place. The risk in any agent deployment is rarely in the technology and almost always in the operating model around it.

What is missing from most AI agent case studies?

Most AI agent case studies leave out the cost and effort of the work that made the agent function: data preparation, knowledge base curation, integration, governance, audit trail, monitoring, change management and human escalation handling. The headline numbers describe what the agent does. The footnotes, when they exist, describe what the customer had to do to make the agent work. Industry research finds that 62% of underperforming AI customer service projects fail because of insufficient data preparation, not platform selection.

What questions should I ask a vendor offering an AI agent?

Three questions cut through most agent pitches. Who did the data preparation, and how long did it take. What does the agent escalate to a human, and what happens when it does not. What did the customer’s first six months of production operation actually look like, including the things that went wrong. If the vendor cannot answer those clearly, the case study is marketing and the engagement is unproven.

Which AI agents are safest to deploy first?

The four safest shapes for a first agent deployment are single-vendor agents on single-vendor data (such as Salesforce Agentforce on Salesforce data or Microsoft Copilot on Microsoft 365 data), read-only research and summarisation agents with human review, narrow Tier 1 customer service agents with clean human escalation, and tightly scoped internal workflow agents on a single system. Cross-system autonomous agents should wait until governance and monitoring are in place.

Considering an AI agent for your business?

Our Perth-based team runs free AI readiness reviews for Australian mid-market businesses. We will tell you which shape of agent is the right first move, what the case study really costs, and what operating work has to happen before you sign anything.

Book a Free AI Readiness Review

About the Author
Written by Moe Chizari, Chief Executive Officer of Epic IT, a managed IT, cyber security and AI partner for Australian mid-market businesses, with offices in Perth, Sydney and Brisbane. Moe brings 17 years across financial markets, treasury and technology, including five years at Bravura Solutions running enterprise software delivery and five years inside Group Treasury at Westpac and Macquarie leading APRA-regulated programmes (APS-117 IRRBB, APS-210 LCR & Capital Transformation). He holds a Bachelor of International Business from RMIT University, is a certified Project Management Professional (PMP), and an AFMA Diploma of Financial Markets graduate.

Further Reading

Previous

Forward Deployed Engineer: what the label actually means

Return to News
Back to News
Next
No next posts to show