Practice of Alibaba Cloud on Elastic Resource Provisioning for Large-scale Microservices Cluster
Minxian Xu, Lei Yang, Yang Wang, Chengxi Gao, Linfeng Wen, Guoyao Xu,, Liping Zhang, Kejiang Ye, Chengzhong Xu

TL;DR
This paper discusses Alibaba Cloud's strategies for efficiently provisioning resources in large-scale microservice clusters, focusing on optimizing resource utilization and maintaining latency requirements through advanced algorithms.
Contribution
It introduces Alibaba's resource provisioning framework and proposes enhanced algorithms that improve resource usage by 10-15% while ensuring latency for microservices.
Findings
Resource usage improved by 10-15%
Enhanced algorithms balance proactive and reactive scheduling
Maintains latency requirements in large-scale clusters
Abstract
Cloud-native architecture is becoming increasingly crucial for today's cloud computing environments due to the need for speed and flexibility in developing applications. It utilizes microservice technology to break down traditional monolithic applications into light-weight and self-contained microservice components. However, as microservices grow in scale and have dynamic inter-dependencies, they also pose new challenges in resource provisioning that cannot be fully addressed by traditional resource scheduling approaches. The various microservices with different resource needs and latency requirements can create complex calling chains, making it difficult to provide fine-grained and accurate resource allocation to each component while maintaining the overall quality of service in the chain. In this work, we aim to address the research problem on how to efficiently provision resources…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Software System Performance and Reliability
