Why Do AI Agents Systematically Fail at Cloud Root Cause Analysis?
Taeyoon Kim, Woohyeok Park, Hoyeong Yun, Kyungyong Lee

TL;DR
This paper analyzes why LLM-based agents fail at cloud root cause analysis by classifying failure types, revealing persistent pitfalls across models, and evaluating mitigation strategies to improve reliability.
Contribution
It introduces a comprehensive failure taxonomy and diagnostic methodology for LLM-based RCA agents, highlighting architecture-related issues beyond model capabilities.
Findings
Hallucinated data interpretation is a common failure.
Incomplete exploration persists across models.
Enriching communication protocols reduces failures by up to 15%.
Abstract
Failures in large-scale cloud systems incur substantial financial losses, making automated Root Cause Analysis (RCA) essential for operational stability. Recent efforts leverage Large Language Model (LLM) agents to automate this task, yet existing systems exhibit low detection accuracy even with capable models, and current evaluation frameworks assess only final answer correctness without revealing why the agent's reasoning failed. This paper presents a process level failure analysis of LLM-based RCA agents. We execute the full OpenRCA benchmark across five LLM models, producing 1,675 agent runs, and classify observed failures into 12 pitfall types across intra-agent reasoning, inter-agent communication, and agent-environment interaction. Our analysis reveals that the most prevalent pitfalls, notably hallucinated data interpretation and incomplete exploration, persist across all models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Explainable Artificial Intelligence (XAI) · Big Data and Digital Economy
