RegDem: Increasing GPU Performance via Shared Memory Register Spilling
Putt Sakdhnagool, Amit Sabne, Rudolf Eigenmann

TL;DR
This paper introduces RegDem, a binary translation method that spills excessive GPU registers into shared memory to increase occupancy and improve performance, outperforming standard compiler approaches.
Contribution
RegDem is a novel binary translation technique that effectively utilizes shared memory for register spilling to enhance GPU occupancy and performance.
Findings
RegDem achieves up to 18% performance improvement over nvcc.
Spilling registers into shared memory increases GPU occupancy.
A compile-time predictor helps select optimal program variants.
Abstract
GPU utilization, measured as occupancy, is limited by the parallel threads' combined usage of on-chip resources, such as registers and the programmer-managed shared memory. Higher resource demand means lower effective parallel thread count, and therefore lower program performance. Our investigation found that registers are often the occupancy limiters. The de-facto nvcc compiler-based approach spills excessive registers to the off-chip memory, ignoring the shared memory and leaving the on-chip resources underutilized. To mitigate the register demand, this paper presents a binary translation technique, called RegDem, that spills excessive registers to the underutilized shared memory by transforming the GPU assembly code (SASS). Most GPU programs do not fully use shared memory, thus allowing RegDem to use it for register spilling. The higher occupancy achieved by RegDem outweighs the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Embedded Systems Design Techniques
