ARC-V: Vertical Resource Adaptivity for HPC Workloads in Containerized Environments
Daniel Medeiros, Jeremy J. Williams, Jacob Wahlgren, Leonardo Saud, Maia Leite, Ivy Peng

TL;DR
This paper introduces ARC-V, a novel vertical autoscaling policy tailored for HPC workloads in containerized environments, addressing inefficiencies of existing autoscalers by leveraging memory consumption patterns for elastic resource provisioning.
Contribution
The paper presents ARC-V, a new autoscaling approach that improves memory efficiency and reduces errors for HPC applications in Kubernetes, unlike traditional cloud-focused autoscalers.
Findings
ARC-V reduces memory usage compared to standard VPA.
ARC-V prevents out-of-memory errors in HPC workloads.
ARC-V effectively adapts to HPC memory consumption patterns.
Abstract
Existing state-of-the-art vertical autoscalers for containerized environments are traditionally built for cloud applications, which might behave differently than HPC workloads with their dynamic resource consumption. In these environments, autoscalers may create an inefficient resource allocation. This work analyzes nine representative HPC applications with different memory consumption patterns. Our results identify the limitations and inefficiencies of the Kubernetes Vertical Pod Autoscaler (VPA) for enabling memory elastic execution of HPC applications. We propose, implement, and evaluate ARC-V. This policy leverages both in-flight resource updates of pods in Kubernetes and the knowledge of memory consumption patterns of HPC applications for achieving elastic memory resource provisioning at the node level. Our results show that ARC-V can effectively save memory while eliminating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Parallel Computing and Optimization Techniques · Cloud Computing and Resource Management
