Sinan: Data-Driven, QoS-Aware Cluster Management for Microservices
Yanqi Zhang, Weizhe Hua, Zhuangzhuang Zhou, Edward Suh, Christina, Delimitrou

TL;DR
Sinan is a scalable, data-driven cluster management system for microservices that uses machine learning to optimize resource allocation, ensuring QoS compliance while maximizing resource efficiency in cloud environments.
Contribution
This paper introduces Sinan, a novel, online, QoS-aware cluster manager that leverages scalable ML models to improve resource management for microservices, addressing cascading QoS violations.
Findings
Sinan consistently meets QoS targets in diverse deployments.
It maintains high cluster utilization compared to prior approaches.
The ML models provide explainable insights for better application deployment.
Abstract
Cloud applications are increasingly shifting from large monolithic services, to large numbers of loosely-coupled, specialized microservices. Despite their advantages in terms of facilitating development, deployment, modularity, and isolation, microservices complicate resource management, as dependencies between them introduce backpressure effects and cascading QoS violations. We present Sinan, a data-driven cluster manager for interactive cloud microservices that is online and QoS-aware. Sinan leverages a set of scalable and validated machine learning models to determine the performance impact of dependencies between microservices, and allocate appropriate resources per tier in a way that preserves the end-to-end tail latency target. We evaluate Sinan both on dedicated local clusters and large-scale deployments on Google Compute Engine (GCE) across representative end-to-end…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Software System Performance and Reliability
