CodeTracer: Towards Traceable Agent States

Han Li; Yifan Yao; Letian Zhu; Rili Feng; Hongyi Ye; Jiaming Wang; Yancheng He; Pengyu Zou; Lehan Zhang; Xinping Lei; Haoyang Huang; Ken Deng; Ming Sun; Zhaoxiang Zhang; He Ye; Jiaheng Liu

arXiv:2604.11641·cs.SE·April 16, 2026

CodeTracer: Towards Traceable Agent States

Han Li, Yifan Yao, Letian Zhu, Rili Feng, Hongyi Ye, Jiaming Wang, Yancheng He, Pengyu Zou, Lehan Zhang, Xinping Lei, Haoyang Huang, Ken Deng, Ming Sun, Zhaoxiang Zhang, He Ye, Jiaheng Liu

PDF

2 Repos

TL;DR

CodeTracer is a novel tracing architecture that reconstructs agent state transitions and localizes failures in complex code agent workflows, improving debugging and failure analysis.

Contribution

It introduces a hierarchical trace reconstruction and failure localization method, along with a large benchmark dataset for evaluating code agent debugging tools.

Findings

01

CodeTracer outperforms direct prompting and lightweight baselines in failure detection.

02

Replaying diagnostic signals with CodeTracer recovers failed runs under matched budgets.

03

The approach enables systematic failure analysis in complex code agent workflows.

Abstract

Code agents are advancing rapidly, but debugging them is becoming increasingly difficult. As frameworks orchestrate parallel tool calls and multi-stage workflows over complex tasks, making the agent's state transitions and error propagation hard to observe. In these runs, an early misstep can trap the agent in unproductive loops or even cascade into fundamental errors, forming hidden error chains that make it hard to tell when the agent goes off track and why. Existing agent tracing analyses either focus on simple interaction or rely on small-scale manual inspection, which limits their scalability and usefulness for real coding workflows. We present CodeTracer, a tracing architecture that parses heterogeneous run artifacts through evolving extractors, reconstructs the full state transition history as a hierarchical trace tree with persistent memory, and performs failure onset…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.