Leveraging Cloud Data to Mitigate User Experience from "Breaking Bad"
Nicholas A. James, Arun Kejariwal, David S. Matteson

TL;DR
This paper introduces a robust statistical method using Energy Statistics and permutation tests to detect breakouts in cloud data, improving accuracy and speed over existing techniques, and is currently deployed at Twitter.
Contribution
The paper presents the first breakout detection method in cloud data that is robust to anomalies, utilizing Energy Statistics and permutation tests for improved detection.
Findings
The proposed technique is 3.5 times faster than existing methods.
It achieves high precision, recall, and F-measure in real-world data.
Currently deployed at Twitter for daily use.
Abstract
Low latency and high availability of an app or a web service are key, amongst other factors, to the overall user experience (which in turn directly impacts the bottomline). Exogenic and/or endogenic factors often give rise to breakouts in cloud data which makes maintaining high availability and delivering high performance very challenging. Although there exists a large body of prior research in breakout detection, existing techniques are not suitable for detecting breakouts in cloud data owing to being not robust in the presence of anomalies. To this end, we developed a novel statistical technique to automatically detect breakouts in cloud data. In particular, the technique employs Energy Statistics to detect breakouts in both application as well as system metrics. Further, the technique uses robust statistical metrics, viz., median, and estimates the statistical significance of a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
