GPU Sharing with Triples Mode

Chansup Byun; Albert Reuther; LaToya Anderson; William Arcand; Bill; Bergeron; David Bestor; Alexander Bonn; Daniel Burrill; Vijay Gadepally,; Michael Houle; Matthew Hubbell; Hayden Jananthan; Michael Jones; Piotr; Luszczek; Peter Michaleas; Lauren Milechin; Guillermo Morales; Julie Mullen,; Andrew Prout; Antonio Rosa; Charles Yee; Jeremy Kepner

arXiv:2410.22254·cs.DC·April 8, 2025

GPU Sharing with Triples Mode

Chansup Byun, Albert Reuther, LaToya Anderson, William Arcand, Bill, Bergeron, David Bestor, Alexander Bonn, Daniel Burrill, Vijay Gadepally,, Michael Houle, Matthew Hubbell, Hayden Jananthan, Michael Jones, Piotr, Luszczek, Peter Michaleas, Lauren Milechin, Guillermo Morales

PDF

Open Access

TL;DR

This paper introduces a GPU sharing method called triples mode, which improves GPU utilization and throughput for AI applications, addressing supply constraints and costs in HPC centers.

Contribution

The paper presents a novel GPU sharing approach supported by custom tools, overcoming limitations of existing methods and enhancing GPU efficiency for AI/ML workloads.

Findings

01

Significant improvement in GPU utilization.

02

Enhanced throughput performance for AI applications.

03

Easy-to-use sharing mode implemented.

Abstract

There is a tremendous amount of interest in AI/ML technologies due to the proliferation of generative AI applications such as ChatGPT. This trend has significantly increased demand on GPUs, which are the workhorses for training AI models. Due to the high costs of GPUs and lacking supply, it has become of interest to optimize GPU usage in HPC centers. MIT Lincoln Laboratory Supercomputing Center (LLSC) has developed an easy-to-use GPU sharing feature supported by LLSC-developed tools including LLsub and LLMapReduce. This approach overcomes some of the limitations with the existing methods for GPU sharing. This allows users to apply GPU sharing whenever possible while they are developing their AI/ML models and/or doing parametric study on their AI models or executing other GPU applications. Based on our initial experimental results with GPU sharing, GPU sharing with triples mode is easy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems · Parallel Computing and Optimization Techniques