Improving GPU Performance Through Resource Sharing
Vishwesh Jatala, Jayvant Anantpur, Amey Karkare

TL;DR
This paper introduces resource sharing techniques for GPUs to improve performance by utilizing unused resources in SMs, leading to significant speedups in various applications.
Contribution
It proposes a novel resource sharing approach for SMs in GPUs, optimizing register and shared memory usage to enhance throughput.
Findings
Maximum 24% performance improvement with register sharing
Maximum 30% improvement with scratchpad sharing
Average improvements of 11% and 12.5% respectively
Abstract
Graphics Processing Units (GPUs) consisting of Streaming Multiprocessors (SMs) achieve high throughput by running a large number of threads and context switching among them to hide execution latencies. The number of thread blocks, and hence the number of threads that can be launched on an SM, depends on the resource usage--e.g. number of registers, amount of shared memory--of the thread blocks. Since the allocation of threads to an SM is at the thread block granularity, some of the resources may not be used up completely and hence will be wasted. We propose an approach that shares the resources of SM to utilize the wasted resources by launching more thread blocks. We show the effectiveness of our approach for two resources: register sharing, and scratchpad (shared memory) sharing. We further propose optimizations to hide long execution latencies, thus reducing the number of stall…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Cloud Computing and Resource Management
