TL;DR
CodeTracer is a novel tracing architecture that reconstructs agent state transitions and localizes failures in complex code agent workflows, improving debugging and failure analysis.
Contribution
It introduces a hierarchical trace reconstruction and failure localization method, along with a large benchmark dataset for evaluating code agent debugging tools.
Findings
CodeTracer outperforms direct prompting and lightweight baselines in failure detection.
Replaying diagnostic signals with CodeTracer recovers failed runs under matched budgets.
The approach enables systematic failure analysis in complex code agent workflows.
Abstract
Code agents are advancing rapidly, but debugging them is becoming increasingly difficult. As frameworks orchestrate parallel tool calls and multi-stage workflows over complex tasks, making the agent's state transitions and error propagation hard to observe. In these runs, an early misstep can trap the agent in unproductive loops or even cascade into fundamental errors, forming hidden error chains that make it hard to tell when the agent goes off track and why. Existing agent tracing analyses either focus on simple interaction or rely on small-scale manual inspection, which limits their scalability and usefulness for real coding workflows. We present CodeTracer, a tracing architecture that parses heterogeneous run artifacts through evolving extractors, reconstructs the full state transition history as a hierarchical trace tree with persistent memory, and performs failure onset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
