Lightning: Scaling the GPU Programming Model Beyond a Single GPU
Stijn Heldens (1, 2), Pieter Hijma (2, 3), Ben van, Werkhoven (1), Jason Maassen (1), Rob. V. van Nieuwpoort (1, 2) ((1), Netherlands eScience Center, (2) University of Amsterdam, (3) VU University, Amsterdam)

TL;DR
Lightning is a framework that enables easy scaling of GPU applications across multiple GPUs and nodes, supporting seamless data spilling and efficient workload distribution, significantly improving performance and capacity beyond a single GPU.
Contribution
Lightning introduces a scalable GPU programming framework that simplifies multi-GPU and multi-node execution with minimal code modifications, supporting data spilling and efficient resource utilization.
Findings
Achieves up to 57.2x speedup over CPU with 16 GPUs across 4 nodes.
Supports up to 32 GPUs with excellent scalability and performance.
Enables existing CUDA kernels to be adapted easily for multi-GPU execution.
Abstract
The GPU programming model is primarily aimed at the development of applications that run one GPU. However, this limits the scalability of GPU code to the capabilities of a single GPU in terms of compute power and memory capacity. To scale GPU applications further, a great engineering effort is typically required: work and data must be divided over multiple GPUs by hand, possibly in multiple nodes, and data must be manually spilled from GPU memory to higher-level memories. We present Lightning: a framework that follows the common GPU programming paradigm but enables scaling to large problems with ease. Lightning supports multi-GPU execution of GPU kernels, even across multiple nodes, and seamlessly spills data to higher-level memories (main memory and disk). Existing CUDA kernels can easily be adapted for use in Lightning, with data access annotations on these kernels allowing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Cloud Computing and Resource Management · Advanced Neural Network Applications
