How to Rent GPUs on a Budget
Zhouzi Li, Benjamin Berg, Arpan Mukhopadhyay, Mor Harchol-Balter

TL;DR
This paper develops an optimal GPU rental policy for cloud-based ML training jobs that minimizes mean response time under a long-term GPU usage budget, considering job parallelizability and size.
Contribution
It derives the first optimal rental policy balancing cost and response time for diverse, streaming ML jobs with minimal assumptions.
Findings
Optimal policy specifies GPU rental and allocation at each moment.
Policy effectively balances training speed and budget constraints.
Results applicable to diverse job streams without strict assumptions.
Abstract
The explosion in Machine Learning (ML) over the past ten years has led to a dramatic increase in demand for GPUs to train ML models. Because it is prohibitively expensive for most users to build and maintain a large GPU cluster, large cloud providers (Microsoft Azure, Amazon AWS, Google Cloud) have seen explosive growth in demand for renting cloud-based GPUs. In this cloud-computing paradigm, a user must specify their demand for GPUs at every moment in time, and will pay for every GPU-hour they use. ML training jobs are known to be parallelizable to different degrees. Given a stream of ML training jobs, a user typically wants to minimize the mean response time across all jobs. Here, the response time of a job denotes the time from when a job arrives until it is complete. Additionally, the user is constrained by some operating budget. Specifically, in this paper the user is constrained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems
