Dissecting Bug Triggers and Failure Modes in Modern Agentic Frameworks: An Empirical Study

Xiaowen Zhang; Hannuo Zhang; Shin Hwei Tan

arXiv:2604.08906·cs.SE·April 13, 2026

Dissecting Bug Triggers and Failure Modes in Modern Agentic Frameworks: An Empirical Study

Xiaowen Zhang, Hannuo Zhang, Shin Hwei Tan

PDF

TL;DR

This empirical study analyzes 409 bugs in modern agentic frameworks, revealing unique failure modes, root causes, and bug patterns to enhance system reliability.

Contribution

It introduces a five-layer abstraction for structural analysis and identifies transferability of bug patterns across frameworks.

Findings

01

Uncovered specialized symptoms like unexpected execution sequences.

02

Identified agent-specific root causes such as model faults and orchestration errors.

03

Discovered frequent bug-triggering patterns transferable across frameworks.

Abstract

Modern agentic frameworks (e.g., CrewAI and AutoGen) have evolved into complex, autonomous multi-agent systems, introducing unique reliability challenges beyond earlier pipeline-based LLM libraries. However, existing empirical studies focus on earlier LLM libraries or task-level bugs, leaving the unique complexities of these agentic frameworks unexplored. We bridge the gap by conducting a comprehensive study of 409 fixed bugs from five representative agentic frameworks. We propose a five-layer abstraction to capture structural complexities in agentic frameworks, spanning from orchestration to infrastructure. Our study uncovers specialized symptoms, such as unexpected execution sequences and user configurations ignored, which are unique to autonomous orchestration. We further identify agent-specific root causes, including modelrelated faults, cognitive context mismanagement, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.