Affinity-Aware Resource Provisioning for Long-Running Applications in   Shared Clusters

Clement Mommessin; Renyu Yang; Natalia V. Shakhlevich; Xiaoyang Sun,; Satish Kumar; Junqing Xiao; Jie Xu

arXiv:2208.12738·cs.DC·August 29, 2022·1 cites

Affinity-Aware Resource Provisioning for Long-Running Applications in Shared Clusters

Clement Mommessin, Renyu Yang, Natalia V. Shakhlevich, Xiaoyang Sun,, Satish Kumar, Junqing Xiao, Jie Xu

PDF

Open Access 1 Repo

TL;DR

This paper introduces an affinity-aware resource provisioning method for long-running applications in shared clusters, aiming to minimize compute nodes while respecting application constraints, thus reducing operational costs and carbon footprint.

Contribution

It proposes a novel affinity-aware scheduling approach tailored for LRAs with multiple constraints, and evaluates various algorithms against real-world data for improved efficiency.

Findings

01

Algorithms achieve competitive effectiveness

02

Reduced number of compute nodes used

03

Efficient scheduling in large-scale scenarios

Abstract

Resource provisioning plays a pivotal role in determining the right amount of infrastructure resource to run applications and target the global decarbonization goal. A significant portion of production clusters is now dedicated to long-running applications (LRAs), which are typically in the form of microservices and executed in the order of hours or even months. It is therefore practically important to plan ahead the placement of LRAs in a shared cluster so that the number of compute nodes required by them can be minimized to reduce carbon footprint and lower operational costs. Existing works on LRA scheduling are often application-agnostic, without particularly addressing the constraining requirements imposed by LRAs, such as co-location affinity constraints and time-varying resource requirements. In this paper, we present an affinity-aware resource provisioning approach for deploying…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dssgroup-leeds/lra-binpacking-expe
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Software System Performance and Reliability · IoT and Edge/Fog Computing