Nissist: An Incident Mitigation Copilot based on Troubleshooting Guides
Kaikai An, Fangkai Yang, Junting Lu, Liqun Li, Zhixing Ren, Hao Huang,, Lu Wang, Pu Zhao, Yu Kang, Hua Ding, Qingwei Lin, Saravan Rajmohan, Dongmei, Zhang, Qi Zhang

TL;DR
Nissist utilizes large language models to analyze troubleshooting guides and incident histories, providing proactive incident mitigation suggestions that reduce resolution time and improve operational efficiency in cloud services.
Contribution
This work introduces Nissist, a novel LLM-based system that automates and enhances incident mitigation by extracting insights from unstructured troubleshooting guides and histories.
Findings
Significantly reduces Time to Mitigate (TTM) in incident handling.
Alleviates operational burden on on-call engineers.
Improves overall service reliability.
Abstract
Effective incident management is pivotal for the smooth operation of enterprises-level cloud services. In order to expedite incident mitigation, service teams compile troubleshooting knowledge into Troubleshooting Guides (TSGs) accessible to on-call engineers (OCEs). While automated pipelines are enabled to resolve the most frequent and easy incidents, there still exist complex incidents that require OCEs' intervention. However, TSGs are often unstructured and incomplete, which requires manual interpretation by OCEs, leading to on-call fatigue and decreased productivity, especially among new-hire OCEs. In this work, we propose Nissist which leverages TSGs and incident mitigation histories to provide proactive suggestions, reducing human intervention. Leveraging Large Language Models (LLM), Nissist extracts insights from unstructured TSGs and historical incident mitigation discussions,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Advanced Malware Detection Techniques · Information and Cyber Security
Methodstravel james
