A Survey of AIOps for Failure Management in the Era of Large Language Models
Lingzhe Zhang, Tong Jia, Mengxi Jia, Yifan Wu, Aiwei Liu, and Yong Yang, Zhonghai Wu, Xuming Hu, Philip S. Yu, Ying Li

TL;DR
This survey reviews how large language models are transforming AIOps for failure management, addressing traditional challenges and exploring new approaches, subtasks, and future directions in the field.
Contribution
It provides the first comprehensive comparison of LLM-based AIOps methods with traditional approaches, detailing tasks, data sources, and challenges.
Findings
LLMs enhance cross-platform and cross-task flexibility in AIOps.
Various LLM-based approaches are suitable for different failure management subtasks.
Identifies key challenges and future research directions in LLM-driven AIOps.
Abstract
As software systems grow increasingly intricate, Artificial Intelligence for IT Operations (AIOps) methods have been widely used in software system failure management to ensure the high availability and reliability of large-scale distributed software systems. However, these methods still face several challenges, such as lack of cross-platform generality and cross-task flexibility. Fortunately, recent advancements in large language models (LLMs) can significantly address these challenges, and many approaches have already been proposed to explore this field. However, there is currently no comprehensive survey that discusses the differences between LLM-based AIOps and traditional AIOps methods. Therefore, this paper presents a comprehensive survey of AIOps technology for failure management in the LLM era. It includes a detailed definition of AIOps tasks for failure management, the data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
