AgentRx: Diagnosing AI Agent Failures from Execution Trajectories
Shraddha Barke, Arnav Goyal, Alind Khare, Avaljot Singh, Suman Nath, Chetan Bansal

TL;DR
AgentRx introduces a novel, automated diagnostic framework that localizes failure points in AI agent trajectories, significantly improving failure attribution across diverse tasks and reducing human effort.
Contribution
This work provides a new benchmark of annotated failure trajectories and a domain-agnostic diagnostic tool that enhances failure localization and attribution in AI agents.
Findings
Improved accuracy in failure step localization.
Effective cross-domain failure attribution.
Benchmark dataset of 115 annotated failure trajectories.
Abstract
AI agents often fail in ways that are difficult to localize because executions are probabilistic, long-horizon, multi-agent, and mediated by noisy tool outputs. We address this gap by manually annotating failed agent runs and release a novel benchmark of 115 failed trajectories spanning structured API workflows, incident management, and open-ended web/file tasks. Each trajectory is annotated with a critical failure step and a category from a grounded-theory derived, cross-domain failure taxonomy. To mitigate the human cost of failure attribution, we present AGENTRX, an automated domain-agnostic diagnostic framework that pinpoints the critical failure step in a failed agent trajectory. It synthesizes constraints, evaluates them step-by-step, and produces an auditable validation log of constraint violations with associated evidence; an LLM-based judge uses this log to localize the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · AI-based Problem Solving and Planning · Explainable Artificial Intelligence (XAI)
