
TL;DR
This paper introduces a flexible, open-source GPU load-balancing framework that enhances performance and productivity for irregular parallel algorithms, demonstrated by a novel matrix multiplication method achieving significant speedups.
Contribution
It presents a new GPU load-balancing abstraction supporting static and dynamic schedules, and introduces Stream-K, a work-centric GEMM parallelization with superior performance.
Findings
Stream-K achieves up to 14x peak speedup on GEMM.
The framework improves GPU utilization for irregular workloads.
Stream-K outperforms CUTLASS and cuBLAS in speed and consistency.
Abstract
Fine-grained workload and resource balancing is the key to high performance for regular and irregular computations on the GPUs. In this dissertation, we conduct an extensive survey of existing load-balancing techniques to build an abstraction that addresses the difficulty of scheduling computations on the GPU. We propose a GPU fine-grained load-balancing abstraction that decouples load balancing from work processing and aims to support both static and dynamic schedules with a programmable interface to implement new load-balancing schedules. Prior to our work, the only way to unleash the GPU's potential on irregular problems has been to workload-balance through application-specific, tightly coupled load-balancing techniques. With our open-source framework for load-balancing, we hope to improve programmers' productivity when developing irregular-parallel algorithms on the GPU, and also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Interconnection Networks and Systems
