Carbon to Diamond: An Incident Remediation Assistant System From Site Reliability Engineers' Conversations in Hybrid Cloud Operations
Suranjana Samanta, Ajay Gupta, Prateeti Mohapatra, Amar Prakash Azad

TL;DR
This paper presents a framework that leverages learning methods to extract key incident artefacts from SRE conversations in hybrid cloud operations, improving incident remediation efficiency.
Contribution
It introduces a novel approach to understand and extract artefacts from semi-formal, domain-specific conversations, addressing challenges in applying standard NLP techniques.
Findings
Effective extraction of diagnostic steps and resolution actions.
Successful identification of similar past conversations.
Demonstrated efficacy on real-world dataset.
Abstract
Conversational channels are changing the landscape of hybrid cloud service management. These channels are becoming important avenues for Site Reliability Engineers (SREs) %Subject Matter Experts (SME) to collaboratively work together to resolve an incident or issue. Identifying segmented conversations and extracting key insights or artefacts from them can help engineers to improve the efficiency of the incident remediation process by using information retrieval mechanisms for similar incidents. However, it has been empirically observed that due to the semi-formal behavior of such conversations (human language) they are very unique in nature and also contain lot of domain-specific terms. This makes it difficult to use the standard natural language processing frameworks directly, which are popularly used in standard NLP tasks. %It is important to identify the correct keywords and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Data Quality and Management · Risk and Safety Analysis
