MicroRes: Versatile Resilience Profiling in Microservices via Degradation Dissemination Indexing
Tianyi Yang, Cheryl Lee, Jiacheng Shen, Yuxin Su, Yongqiang Yang, and, Michael R. Lyu

TL;DR
MicroRes is a novel framework that assesses microservice resilience by analyzing how failures propagate from system metrics to user experience metrics, enabling efficient and versatile resilience profiling.
Contribution
MicroRes introduces a new degradation dissemination indexing method for resilience profiling, reducing manual effort and increasing adaptability across microservice systems.
Findings
Effective resilience profiling demonstrated on multiple microservice systems.
MicroRes accurately ranks metrics by their impact on service degradation.
Practical deployment in production environments shows its real-world applicability.
Abstract
Microservice resilience, the ability of microservices to recover from failures and continue providing reliable and responsive services, is crucial for cloud vendors. However, the current practice relies on manually configured rules specific to a certain microservice system, resulting in labor-intensity and flexibility issues, given the large scale and high dynamics of microservices. A more labor-efficient and versatile solution is desired. Our insight is that resilient deployment can effectively prevent the dissemination of degradation from system performance metrics to user-aware metrics, and the latter directly affects service quality. In other words, failures in a non-resilient deployment can impact both types of metrics, leading to user dissatisfaction. With this in mind, we propose MicroRes, the first versatile resilience profiling framework for microservices via degradation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · IoT and Edge/Fog Computing
