Affinity-Aware Resource Provisioning for Long-Running Applications in Shared Clusters
Clement Mommessin, Renyu Yang, Natalia V. Shakhlevich, Xiaoyang Sun,, Satish Kumar, Junqing Xiao, Jie Xu

TL;DR
This paper introduces an affinity-aware resource provisioning method for long-running applications in shared clusters, aiming to minimize compute nodes while respecting application constraints, thus reducing operational costs and carbon footprint.
Contribution
It proposes a novel affinity-aware scheduling approach tailored for LRAs with multiple constraints, and evaluates various algorithms against real-world data for improved efficiency.
Findings
Algorithms achieve competitive effectiveness
Reduced number of compute nodes used
Efficient scheduling in large-scale scenarios
Abstract
Resource provisioning plays a pivotal role in determining the right amount of infrastructure resource to run applications and target the global decarbonization goal. A significant portion of production clusters is now dedicated to long-running applications (LRAs), which are typically in the form of microservices and executed in the order of hours or even months. It is therefore practically important to plan ahead the placement of LRAs in a shared cluster so that the number of compute nodes required by them can be minimized to reduce carbon footprint and lower operational costs. Existing works on LRA scheduling are often application-agnostic, without particularly addressing the constraining requirements imposed by LRAs, such as co-location affinity constraints and time-varying resource requirements. In this paper, we present an affinity-aware resource provisioning approach for deploying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Software System Performance and Reliability · IoT and Edge/Fog Computing
