A Holistic Framework for Automated Configuration Recommendation for Cloud Service Monitoring
Anson Bastos, Shreeya Venneti, Anjaly Parayil, Ayush Choure, Chetan Bansal, Rujia Wang

TL;DR
This paper introduces a comprehensive, automated framework for recommending optimal cloud service monitor configurations, aiming to improve reliability and reduce operational overhead in large-scale cloud environments.
Contribution
It presents a novel modular recommendation framework that processes service entity graphs to automate and enhance monitor configuration in cloud services.
Findings
Effective in suggesting relevant monitor configurations
Reduces manual effort and operational overhead
Validated through extensive experiments at Microsoft
Abstract
Reliability of large-scale cloud services is critical for user satisfaction and business continuity. Despite significant investments in reliability engineering, production incidents remain inevitable, often leading to customer impact and operational overhead. In large cloud companies, multiple services are deployed across regions necessitating robust health monitoring systems. However, the current monitor configuration process is manual, largely reactive and ad hoc, resulting in gaps in coverage and redundant alerts. In this paper, we present a comprehensive study of monitor creation in Microsoft, identifying key components in the existing process. We further design a modular recommendation framework that processes the graph structured service entities to suggest optimal monitor configurations. Through extensive experimentation on historical data and user study of recommendations for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · Service-Oriented Architecture and Web Services
