AIOps Solutions for Incident Management: Technical Guidelines and A Comprehensive Literature Review
Youcef Remil, Anes Bendimerad, Romain Mathonat, Mehdi Kaytoue

TL;DR
This paper reviews AIOps for incident management, proposing a structured taxonomy, guidelines, and categorization to address the lack of standardization and facilitate future research and development in the field.
Contribution
It introduces an AIOps terminology, taxonomy, and structured incident management framework, filling gaps in standardization and providing a foundation for future advancements.
Findings
Proposed a comprehensive AIOps taxonomy and incident management procedure.
Categorized existing contributions based on tasks, data sources, and approaches.
Identified research gaps and future directions in AIOps for incident management.
Abstract
The management of modern IT systems poses unique challenges, necessitating scalability, reliability, and efficiency in handling extensive data streams. Traditional methods, reliant on manual tasks and rule-based approaches, prove inefficient for the substantial data volumes and alerts generated by IT systems. Artificial Intelligence for Operating Systems (AIOps) has emerged as a solution, leveraging advanced analytics like machine learning and big data to enhance incident management. AIOps detects and predicts incidents, identifies root causes, and automates healing actions, improving quality and reducing operational costs. However, despite its potential, the AIOps domain is still in its early stages, decentralized across multiple sectors, and lacking standardized conventions. Research and industrial contributions are distributed without consistent frameworks for data management, target…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRisk and Safety Analysis · Occupational Health and Safety Research
