AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds
Yinfang Chen, Manish Shetty, Gagan Somashekar, Minghua Ma, Yogesh, Simmhan, Jonathan Mace, Chetan Bansal, Rujia Wang, Saravan Rajmohan

TL;DR
AIOPSLAB is a comprehensive framework designed to evaluate AI agents in autonomous cloud management, supporting end-to-end operational automation and facilitating insights into agent capabilities and limitations.
Contribution
The paper introduces AIOPSLAB, a holistic framework for deploying, orchestrating, and evaluating AI agents in cloud environments, advancing the development of autonomous cloud operations.
Findings
Evaluated state-of-the-art LLM agents within AIOPSLAB benchmark.
Identified capabilities of AI agents in complex operational tasks.
Highlighted limitations of current AI agents in cloud management.
Abstract
AI for IT Operations (AIOps) aims to automate complex operational tasks, such as fault localization and root cause analysis, to reduce human workload and minimize customer impact. While traditional DevOps tools and AIOps algorithms often focus on addressing isolated operational tasks, recent advances in Large Language Models (LLMs) and AI agents are revolutionizing AIOps by enabling end-to-end and multitask automation. This paper envisions a future where AI agents autonomously manage operational tasks throughout the entire incident lifecycle, leading to self-healing cloud systems, a paradigm we term AgentOps. Realizing this vision requires a comprehensive framework to guide the design, development, and evaluation of these agents. To this end, we present AIOPSLAB, a framework that not only deploys microservice cloud environments, injects faults, generates workloads, and exports telemetry…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFocus
