Understanding Code Agent Behaviour: An Empirical Study of Success and Failure Trajectories
Oorja Majgaonkar, Zhiwei Fei, Xiang Li, Federica Sarro, He Ye

TL;DR
This empirical study analyzes the behaviour of state-of-the-art code agents during software issue resolution, revealing key strategies, differences between success and failure trajectories, and implications for building more robust autonomous systems.
Contribution
The paper provides the first detailed empirical analysis of code agent trajectories, uncovering behavioral patterns and factors influencing success and failure in automated software engineering.
Findings
Successful trajectories often use defensive programming and context gathering.
Failed trajectories are longer and more variable than successful ones.
Most trajectories correctly identify problematic files even in failures.
Abstract
The increasing deployment of Large Language Model (LLM) agents for complex software engineering tasks has created a need to understand their problem-solving behaviours beyond simple success metrics. While these agents demonstrate impressive capabilities in automated issue resolution, their decision-making processes remain largely opaque. This paper presents an empirical study of agent trajectories, namely the execution traces capturing the steps agents take when attempting to resolve software issues. We analyse trajectories from three state-of-the-art code agents (OpenHands, SWE-agent, and Prometheus) on the SWE-Bench benchmark, examining both successful and failed attempts. Our investigation reveals several key insights into agent behaviour. First, we identify how distinct problem-solving strategies, such as defensive programming and context gathering, enable success in different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Software System Performance and Reliability
