Approximate Quantiles for Datacenter Telemetry Monitoring
Gangmuk Lim, Mohamed Hassan, Ze Jin, Stavros Volos, Myeongjae Jeon

TL;DR
This paper introduces AOMG, an efficient algorithm for real-time quantile approximation in datacenter telemetry, achieving high accuracy and throughput with minimal memory use by leveraging workload-driven insights.
Contribution
AOMG is a novel quantile approximation method that significantly reduces memory and computation while maintaining high accuracy, tailored for datacenter telemetry streaming analytics.
Findings
AOMG achieves less than 5% relative value error across various use cases.
AOMG outperforms state-of-the-art algorithms in throughput and accuracy.
AOMG reduces memory footprint through compression and summarization techniques.
Abstract
Datacenter systems require efficient troubleshooting and effective resource scheduling so as to minimize downtimes and to efficiently utilize limited resources. In doing so, datacenter operators employ streaming analytics for collecting and processing datacenter telemetry over a temporal window. The quantile operator is key to these systems as it can summarize the typical and abnormal behavior of the monitored system. Computing quantiles in real-time is resource-intensive as it requires processing hundreds of millions of events in seconds while providing high accuracy. We overcome these challenges in real-time quantile computation through workload-driven approximation, motivated by three insights in our study: (i) values are dominated by a set of recurring small values, (ii) distribution of small values is consistent across different time scales, and (iii) tail values are dominated by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Advanced Data Storage Technologies · Distributed and Parallel Computing Systems
