Cloud-OpsBench: A Reproducible Benchmark for Agentic Root Cause Analysis in Cloud Systems
Yilun Wang, Guangba Yu, Haiyu Huang, Zirui Wang, Yujie Huang, Pengfei Chen, Michael R. Lyu

TL;DR
Cloud-OpsBench is a comprehensive, reproducible benchmark for agentic root cause analysis in cloud systems, enabling advanced research in reasoning, policy training, and system diagnostics.
Contribution
It introduces a large-scale, deterministic benchmark with a State Snapshot Paradigm for evaluating and developing agentic RCA methods in cloud environments.
Findings
Provides 452 fault cases across 40 root cause types
Enables supervised fine-tuning of language models for RCA
Creates a safe RL environment for policy optimization
Abstract
The transition to agentic Root Cause Analysis (RCA) necessitates benchmarks that evaluate active reasoning rather than passive classification. However, current frameworks fail to reconcile ecological validity with reproducibility. We introduce Cloud-OpsBench, a large-scale benchmark that employs a State Snapshot Paradigm to construct a deterministic digital twin of the cloud, featuring 452 distinct fault cases across 40 root cause types spanning the full Kubernetes stack. Crucially, Cloud-OpsBench serves as an enabling infrastructure for next-generation SRE research: (1) As a Data Engine, it harvests high-quality reasoning trajectories to bootstrap Supervised Fine-Tuning (SFT) for Small Language Models; (2) As an Reinforcement Learning (RL) environment, it transforms high-risk operations into a safe low-latency sandbox for training policy optimization agents; and (3) As a Diagnostic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · Advanced Software Engineering Methodologies
