Scratchpad Sharing in GPUs
Vishwesh Jatala, Jayvant Anantpur, Amey Karkare

TL;DR
This paper introduces Scratchpad Sharing, a combination of architectural and compiler optimizations for GPUs that enhances scratchpad memory utilization, leading to significant performance improvements in GPGPU applications.
Contribution
It proposes a novel scratchpad sharing technique with scheduling and compiler strategies to better utilize scratchpad memory in GPUs.
Findings
Average performance improvement of 19% across tested kernels
Maximum improvement of 92.17% in certain kernels
Effective utilization of unutilized scratchpad memory
Abstract
GPGPU applications exploit on-chip scratchpad memory available in the Graphics Processing Units (GPUs) to improve performance. The amount of thread level parallelism present in the GPU is limited by the number of resident threads, which in turn depends on the availability of scratchpad memory in its streaming multiprocessor (SM). Since the scratchpad memory is allocated at thread block granularity, part of the memory may remain unutilized. In this paper, we propose architectural and compiler optimizations to improve the scratchpad utilization. Our approach, Scratchpad Sharing, addresses scratchpad under-utilization by launching additional thread blocks in each SM. These thread blocks use unutilized scratchpad and also share scratchpad with other resident blocks. To improve the performance of scratchpad sharing, we propose Owner Warp First (OWF) scheduling that schedules warps from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Interconnection Networks and Systems
