What "Real Cost" Actually Means
The companies that have been running agents in production long enough to know include Hello Fresh, Genesys, and TransGlobal in logistics. The numbers:
Those are real numbers. The teams that hit them share one trait: they started with a specific workflow, measured obsessively, and expanded only after proving ROI.
Where Production Differs From Pilots
Pilots are controlled. Production is chaotic.
In a pilot, you have a bounded dataset, a narrow scope, and an engineer watching every run. In production, you get:
- Prompt drift — as users ask things you didn't anticipate
- Retry amplification — a slightly longer conversation multiplies token usage
- Integration breakage — your CRM API changes and the agent fails silently 200 times before someone notices
- Context growth — conversations get longer, and so does every API call
The 40% failure rate Gartner predicts for agentic AI projects by 2027 isn't because teams can't build agents. It's because teams don't build operations for agents.
The Three Cost Layers Nobody Talks About
When you add these three layers, the math changes. A "simple" agent that looked like a $500/month line item is actually a $4,000–8,000/month operational commitment — with a $60–120K build attached.
The Payback Question
Here is what teams that hit strong ROI did differently:
They chose high-volume, repeatable workflows first.
Invoice reconciliation pays back in 8 months. Supply chain emissions traceability takes 28 months. The difference in ROI timelines is enormous — and most teams pick the wrong starting use case because it sounds more interesting.
They measured at the decision level, not the invoice level.
One company tracked "cost per customer query answered." Another tracked "cost per resolved ticket." Both looked at the same agent and drew different conclusions about whether it was working. The first was right.
They set cost guardrails before scaling.
Confidence thresholds, retry caps, context limits — these aren't restrictions on intelligence. They're the difference between a $3,500/month agent and a $12,000/month agent running the same workflow.
The honest take: Agents in production can generate 10x ROI. Genesys did. Hello Fresh did. TransGlobal did.
They also require the same operational rigor you'd apply to a production database — monitoring, maintenance, cost governance. The companies that treat agent deployment as "ship it and forget it" end up with runaway token bills and no explanation.
The pilots that converted to production are the ones where someone asked, every week: what's this actually costing, and is the outcome worth it?