• Many organizations deploy AI agents but underestimate the costs of testing and evaluating outputs, especially due to the non-deterministic nature of results.
  • According to surveys, nearly 80% of businesses have used AI agents, but most did not anticipate training and evaluation costs, leading to severe budget overruns.
  • Lior Gavish, CTO of Monte Carlo, states that many companies use “LLM as a judge” to score outputs, which can make evaluation costs higher than the cost of running the agent itself.
  • One LLM-based evaluation lasting several days once cost Monte Carlo a five-figure invoice, showing that each LLM call is far more expensive than traditional software.
  • Using an LLM to judge another LLM also carries a risk of bias because results are non-repeatable; the same test can yield different results each run.
  • Evaluation costs depend on agent complexity: small agents may cost a few thousand USD, while complex ones can reach tens of thousands.
  • Besides compute and API fees, the largest often-overlooked cost is human evaluation to establish a “ground truth.”
  • Paul Ferguson from Clearlead AI Consulting emphasizes that vague use cases like customer service are very difficult to define as right or wrong.
  • Chengyu “Cay” Zhang of Redcar.ai calls evaluation “insurance”; cutting it is merely technical debt to be paid later.
  • Evaluation methods include cheap unit tests, AI-based scoring, red-teaming, and expensive human shadowing.
  • Recommendations: narrow the agent’s scope, use frameworks like LangSmith, PromptLayer, or Ragas, test early, and set evaluation budget limits.

📌 Conclusion: Nearly 80% of businesses have used AI agents, but most failed to foresee training and evaluation costs, resulting in serious budget overruns. AI agents incur not only deployment costs but also an “unpredictable multiplier” from evaluation. Businesses are often shocked by testing expenses, especially when requiring LLM-on-LLM scoring and human oversight. A sustainable approach involves narrowing scope, starting with use cases that have clear answers, testing early, using specialized frameworks, and viewing evaluation as mandatory insurance to avoid future brand and operational risks.

Share.
VIET NAM CONSULTING AND MEASUREMENT JOINT STOCK COMPANY
Contact

Email: info@vietmetric.vn
Address: No. 34, Alley 91, Tran Duy Hung Street, Yen Hoa Ward, Hanoi City

© 2026 Vietmetric
Exit mobile version