BOA Constrictor: Squeezing Performance out of GPUs in the Cloud via Budget-Optimal Allocation
Zhouzi Li, Cindy Zhu, Arpan Mukhopadhyay, Mor Harchol-Balter, Benjamin Berg

TL;DR
BOA Constrictor is a scheduler that optimally allocates GPU resources in cloud environments to minimize job completion times within a fixed budget, balancing cost and performance effectively.
Contribution
It introduces a novel budget-constrained scheduling policy for GPU cloud resources that improves performance over existing heuristics.
Findings
Reduces average job completion time by 1.6x in small-scale experiments.
Achieves 2x faster job completion in large-scale simulations.
Demonstrates effective balancing of cost and performance in GPU scheduling.
Abstract
The past decade has seen a dramatic increase in demand for GPUs to train Machine Learning (ML) models. Because it is prohibitively expensive for most organizations to build and maintain a large GPU cluster, organizations instead choose to rent GPUs from cloud providers. The customer is responsible for devising a policy for (i) deciding how many GPUs to rent at every moment in time to process a stream of ML training jobs and (ii) allocating the rented GPUs among the currently active jobs in the system. Because ML training jobs can be parallelized across different numbers of GPUs, the customer generally has many options for how many GPUs to use for each job. Allocating more GPUs to a single training job will cause the job to complete more quickly. However, the customer pays for each GPU-hour they use, and a training job receives a diminishing marginal benefit from running on additional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems · Big Data and Digital Economy
