Dissecting Bug Triggers and Failure Modes in Modern Agentic Frameworks: An Empirical Study
Xiaowen Zhang, Hannuo Zhang, Shin Hwei Tan

TL;DR
This empirical study analyzes 409 bugs in modern agentic frameworks, revealing unique failure modes, root causes, and bug patterns to enhance system reliability.
Contribution
It introduces a five-layer abstraction for structural analysis and identifies transferability of bug patterns across frameworks.
Findings
Uncovered specialized symptoms like unexpected execution sequences.
Identified agent-specific root causes such as model faults and orchestration errors.
Discovered frequent bug-triggering patterns transferable across frameworks.
Abstract
Modern agentic frameworks (e.g., CrewAI and AutoGen) have evolved into complex, autonomous multi-agent systems, introducing unique reliability challenges beyond earlier pipeline-based LLM libraries. However, existing empirical studies focus on earlier LLM libraries or task-level bugs, leaving the unique complexities of these agentic frameworks unexplored. We bridge the gap by conducting a comprehensive study of 409 fixed bugs from five representative agentic frameworks. We propose a five-layer abstraction to capture structural complexities in agentic frameworks, spanning from orchestration to infrastructure. Our study uncovers specialized symptoms, such as unexpected execution sequences and user configurations ignored, which are unique to autonomous orchestration. We further identify agent-specific root causes, including modelrelated faults, cognitive context mismanagement, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
