Outage-Watch: Early Prediction of Outages using Extreme Event Regularizer
Shubham Agarwal, Sarthak Chakraborty, Shaddy Garg, Sumit Bisht, Chahat, Jain, Ashritha Gonuguntla, Shiv Saini

TL;DR
Outage-Watch is a novel predictive method that uses an extreme event regularizer and Gaussian mixture models to forecast critical cloud service outages in advance, significantly reducing detection time.
Contribution
It introduces a new approach combining QoS metrics, Gaussian mixture modeling, and an extreme event regularizer for early outage prediction in cloud services.
Findings
Achieved an average AUC of 0.98 in outage prediction.
Reduced Mean Time To Detection (MTTD) by up to 88%.
Successfully detected all outages with significant metric changes.
Abstract
Cloud services are omnipresent and critical cloud service failure is a fact of life. In order to retain customers and prevent revenue loss, it is important to provide high reliability guarantees for these services. One way to do this is by predicting outages in advance, which can help in reducing the severity as well as time to recovery. It is difficult to forecast critical failures due to the rarity of these events. Moreover, critical failures are ill-defined in terms of observable data. Our proposed method, Outage-Watch, defines critical service outages as deteriorations in the Quality of Service (QoS) captured by a set of metrics. Outage-Watch detects such outages in advance by using current system state to predict whether the QoS metrics will cross a threshold and initiate an extreme event. A mixture of Gaussian is used to model the distribution of the QoS metrics for flexibility…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methodstravel james
