When Agents Fail: A Comprehensive Study of Bugs in LLM Agents with Automated Labeling
Niful Islam, Ragib Shahriar Ayon, Deepak George Thomas, Shibbir Ahmed, Mohammad Wardat

TL;DR
This paper conducts a comprehensive analysis of bugs in LLM-based agents, examining bug types, causes, and effects, and explores automated bug detection using a specialized ReAct agent.
Contribution
It provides the first large-scale study of bugs in LLM agents and evaluates an automated bug annotation approach with a novel ReAct-based system.
Findings
Analyzed 1,187 bug-related posts from multiple platforms.
BugReAct with Gemini 2.5 Flash effectively detects and annotates bugs.
Automated bug annotation costs approximately 0.01 USD per case.
Abstract
Large Language Models (LLMs) have revolutionized intelligent application development. While standalone LLMs cannot perform any actions, LLM agents address the limitation by integrating tools. However, debugging LLM agents is difficult and costly as the field is still in it's early stage and the community is underdeveloped. To understand the bugs encountered during agent development, we present the first comprehensive study of bug types, root causes, and effects in LLM agent-based software. We collected and analyzed 1,187 bug-related posts and code snippets from Stack Overflow, GitHub, and Hugging Face forums, focused on LLM agents built with seven widely used LLM frameworks as well as custom implementations. For a deeper analysis, we have also studied the component where the bug occurred, along with the programming language and framework. This study also investigates the feasibility of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
