Stream-K++: Adaptive GPU GEMM Kernel Scheduling and Selection using Bloom Filters
Harisankar Sadasivan, Muhammed Emin Ozturk, Muhammad Osama, Chris Millette, Astha Rai, Maksim Podkorytov, John Afaganis, Carlus Huang, Jing Zhang, Jun Liu

TL;DR
Stream-K++ enhances GPU GEMM kernel scheduling by expanding policy options and using Bloom filters for rapid configuration filtering, significantly improving performance and adaptability for AI workloads.
Contribution
It extends the Stream-K GEMM scheduling algorithm with more policies and an efficient Bloom filter-based solution selector, enabling faster and more effective workload balancing on GPUs.
Findings
Up to 43% performance improvement in certain scenarios
Filters out 95.8% of unsuitable configurations rapidly
Maintains near-optimal performance for most problem sizes
Abstract
General matrix multiplication (GEMM) operations are the fundamental building blocks of computational domains including artificial intelligence (AI). As GPU architectures evolve and high-performance AI becomes increasingly important, optimizing GEMM performance becomes a fundamental problem that needs to be addressed. This paper introduces Stream-K++, an enhancement to the promising Stream-K GEMM scheduling algorithm for workload balancing. We expand Stream-K's scheduling policies from three to seven and implement an efficient solution selection mechanism using Bloom filters. Our approach rapidly eliminates up to 95.8% of unsuitable configurations while maintaining a 100% true-negative rate. Implemented using the AMD Composable Kernel library and evaluated on AMD Instinct MI250X GPUs, Stream-K++ demonstrates significant performance gains (up to 43%) in select scenarios. It remains…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
