Lightning: Scaling the GPU Programming Model Beyond a Single GPU

Stijn Heldens (1; 2); Pieter Hijma (2; 3); Ben van; Werkhoven (1); Jason Maassen (1); Rob. V. van Nieuwpoort (1; 2) ((1); Netherlands eScience Center; (2) University of Amsterdam; (3) VU University; Amsterdam)

arXiv:2202.05549·cs.DC·March 3, 2022·1 cites

Lightning: Scaling the GPU Programming Model Beyond a Single GPU

Stijn Heldens (1, 2), Pieter Hijma (2, 3), Ben van, Werkhoven (1), Jason Maassen (1), Rob. V. van Nieuwpoort (1, 2) ((1), Netherlands eScience Center, (2) University of Amsterdam, (3) VU University, Amsterdam)

PDF

Open Access 1 Repo

TL;DR

Lightning is a framework that enables easy scaling of GPU applications across multiple GPUs and nodes, supporting seamless data spilling and efficient workload distribution, significantly improving performance and capacity beyond a single GPU.

Contribution

Lightning introduces a scalable GPU programming framework that simplifies multi-GPU and multi-node execution with minimal code modifications, supporting data spilling and efficient resource utilization.

Findings

01

Achieves up to 57.2x speedup over CPU with 16 GPUs across 4 nodes.

02

Supports up to 32 GPUs with excellent scalability and performance.

03

Enables existing CUDA kernels to be adapted easily for multi-GPU execution.

Abstract

The GPU programming model is primarily aimed at the development of applications that run one GPU. However, this limits the scalability of GPU code to the capabilities of a single GPU in terms of compute power and memory capacity. To scale GPU applications further, a great engineering effort is typically required: work and data must be divided over multiple GPUs by hand, possibly in multiple nodes, and data must be manually spilled from GPU memory to higher-level memories. We present Lightning: a framework that follows the common GPU programming paradigm but enables scaling to large problems with ease. Lightning supports multi-GPU execution of GPU kernels, even across multiple nodes, and seamlessly spills data to higher-level memories (main memory and disk). Existing CUDA kernels can easily be adapted for use in Lightning, with data access annotations on these kernels allowing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lightning-project/lightning
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Cloud Computing and Resource Management · Advanced Neural Network Applications