A Framework for Accelerating Bottlenecks in GPU Execution with Assist Warps
Nandita Vijaykumar, Gennady Pekhimenko, Adwait Jog, Saugata Ghose,, Abhishek Bhowmick, Rachata Ausavarangnirun, Chita Das, Mahmut Kandemir, Todd, C. Mowry, Onur Mutlu

TL;DR
This paper introduces CABA, a flexible framework that uses assist warps to dynamically address GPU bottlenecks by leveraging idle resources for tasks like data compression and memoization, significantly improving performance.
Contribution
The paper presents CABA, a novel framework that automatically generates assist warps to mitigate GPU bottlenecks, enhancing resource utilization and performance.
Findings
CABA achieves an average of 41.7% performance improvement.
Assist warps effectively perform data compression to reduce memory bandwidth usage.
CABA adapts to different bottleneck scenarios for optimized GPU execution.
Abstract
Modern Graphics Processing Units (GPUs) are well provisioned to support the concurrent execution of thousands of threads. Unfortunately, different bottlenecks during execution and heterogeneous application requirements create imbalances in utilization of resources in the cores. For example, when a GPU is bottlenecked by the available off-chip memory bandwidth, its computational resources are often overwhelmingly idle, waiting for data from memory to arrive. This work describes the Core-Assisted Bottleneck Acceleration (CABA) framework that employs idle on-chip resources to alleviate different bottlenecks in GPU execution. CABA provides flexible mechanisms to automatically generate "assist warps" that execute on GPU cores to perform specific tasks that can improve GPU performance and efficiency. CABA enables the use of idle computational units and pipelines to alleviate the memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Cloud Computing and Resource Management
