Metric Criticality Identification for Cloud Microservices
Akanksha Singal, Divya Pathak, Kaustabha Ray, Felix George, Mudit Verma, Pratibha Moogi

TL;DR
KIMetrix is a data-driven system that automatically identifies minimal, critical metric subsets for effective monitoring of microservice architectures, reducing manual effort and improving reliability.
Contribution
We introduce KIMetrix, a novel information-theoretic approach that automatically selects essential metrics for microservice monitoring without requiring expert input or extensive logs.
Findings
Effectively reduces the number of metrics needed for monitoring.
Maintains comprehensive system coverage with fewer metrics.
Operates efficiently using lightweight metrics and traces.
Abstract
Modern cloud-native applications built on microservice architectures present unprecedented challenges for system monitoring and alerting. Site Reliability Engineers (SREs) face the daunting challenge of defining effective monitoring strategies across multitude of metrics to ensure system reliability, a task that traditionally requires extensive manual expertise. The distributed nature of microservices, characterized by stochastic execution patterns and intricate inter-service dependencies, renders the traditional manual approach of navigating the vast metrics landscape computationally and operationally prohibitive. To address this critical challenge, we propose KIMetrix, a data-driven system that automatically identifies minimal yet comprehensive metric subsets to aid SREs in monitoring microservice applications. KIMetrix leverages information-theoretic measures, specifically entropy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · IoT and Edge/Fog Computing
