AgentAssay: Token-Efficient Regression Testing for Non-Deterministic AI Agent Workflows
Varun Pratap Bhardwaj

TL;DR
AgentAssay introduces a token-efficient, statistically grounded framework for regression testing non-deterministic AI agent workflows, significantly reducing costs while ensuring reliable detection of regressions across multiple models and scenarios.
Contribution
It is the first to provide a comprehensive, token-efficient regression testing framework with statistical guarantees for non-deterministic AI agents, including novel metrics and offline analysis methods.
Findings
Behavioral fingerprinting achieves 86% detection power.
SPRT reduces testing trials by 78%.
Full pipeline achieves 100% cost savings.
Abstract
Autonomous AI agents are deployed at unprecedented scale, yet no principled methodology exists for verifying that an agent has not regressed after changes to its prompts, tools, models, or orchestration logic. We present AgentAssay, the first token-efficient framework for regression testing non-deterministic AI agent workflows, achieving 78-100% cost reduction while maintaining rigorous statistical guarantees. Our contributions include: (1) stochastic three-valued verdicts (PASS/FAIL/INCONCLUSIVE) grounded in hypothesis testing; (2) five-dimensional agent coverage metrics; (3) agent-specific mutation testing operators; (4) metamorphic relations for agent workflows; (5) CI/CD deployment gates as statistical decision procedures; (6) behavioral fingerprinting that maps execution traces to compact vectors, enabling multivariate regression detection; (7) adaptive budget…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Software Testing and Debugging Techniques · Explainable Artificial Intelligence (XAI)
