GPU Load Balancing

Muhammad Osama

arXiv:2212.08964·cs.DC·December 20, 2022

GPU Load Balancing

Muhammad Osama

PDF

Open Access

TL;DR

This paper introduces a flexible, open-source GPU load-balancing framework that enhances performance and productivity for irregular parallel algorithms, demonstrated by a novel matrix multiplication method achieving significant speedups.

Contribution

It presents a new GPU load-balancing abstraction supporting static and dynamic schedules, and introduces Stream-K, a work-centric GEMM parallelization with superior performance.

Findings

01

Stream-K achieves up to 14x peak speedup on GEMM.

02

The framework improves GPU utilization for irregular workloads.

03

Stream-K outperforms CUTLASS and cuBLAS in speed and consistency.

Abstract

Fine-grained workload and resource balancing is the key to high performance for regular and irregular computations on the GPUs. In this dissertation, we conduct an extensive survey of existing load-balancing techniques to build an abstraction that addresses the difficulty of scheduling computations on the GPU. We propose a GPU fine-grained load-balancing abstraction that decouples load balancing from work processing and aims to support both static and dynamic schedules with a programmable interface to implement new load-balancing schedules. Prior to our work, the only way to unleash the GPU's potential on irregular problems has been to workload-balance through application-specific, tightly coupled load-balancing techniques. With our open-source framework for load-balancing, we hope to improve programmers' productivity when developing irregular-parallel algorithms on the GPU, and also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Interconnection Networks and Systems