C-Koordinator: Interference-aware Management for Large-scale and Co-located Microservice Clusters
Shengye Song, Minxian Xu, Zuowei Zhang, Chengxi Gao, Fansong Zeng, Yu Ding, Kejiang Ye, Chengzhong Xu

TL;DR
This paper introduces C-Koordinator, an interference-aware management system for large-scale co-located microservice clusters, improving resource utilization and reducing latency through CPI-based interference prediction.
Contribution
The paper presents a novel interference mitigation platform, C-Koordinator, utilizing CPI metrics for accurate interference prediction and management in Alibaba's microservice clusters.
Findings
Interference prediction models achieve over 90.3% accuracy.
Application latency reduced by 16.7% to 36.1%.
System maintains stable performance under various loads.
Abstract
Microservices transform traditional monolithic applications into lightweight, loosely coupled application components and have been widely adopted in many enterprises. Cloud platform infrastructure providers enhance the resource utilization efficiency of microservices systems by co-locating different microservices. However, this approach also introduces resource competition and interference among microservices. Designing interference-aware strategies for large-scale, co-located microservice clusters is crucial for enhancing resource utilization and mitigating competition-induced interference. These challenges are further exacerbated by unreliable metrics, application diversity, and node heterogeneity. In this paper, we first analyze the characteristics of large-scale and co-located microservices clusters at Alibaba and further discuss why cycle per instruction (CPI) is adopted as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · Software-Defined Networks and 5G
