FRSZ2 for In-Register Block Compression Inside GMRES on GPUs
Thomas Gr\"utzmacher, Robert Underwood, Sheng Di, Franck Cappello,, Hartwig Anzt

TL;DR
This paper introduces FRSZ2, an in-register compressor for GMRES on GPUs, which enables larger runtime savings through sophisticated compression without sacrificing accuracy.
Contribution
The paper presents FRSZ2, a novel in-register compressor that decompresses at GPU bandwidth speed, improving GMRES performance over simple low-precision methods.
Findings
FRSZ2 achieves decompression at GPU bandwidth speed.
Using FRSZ2 yields larger runtime benefits than low precision.
Final solution accuracy remains unaffected.
Abstract
The performance of the GMRES iterative solver on GPUs is limited by the GPU main memory bandwidth. Compressed Basis GMRES outperforms GMRES by storing the Krylov basis in low precision, thereby reducing the memory access. An open question is whether compression techniques that are more sophisticated than casting to low precision can enable large runtime savings while preserving the accuracy of the final results. This paper presents the lightweight in-register compressor FRSZ2 that can decompress at the bandwidth speed of a modern NVIDIA H100 GPU. In an experimental evaluation, we demonstrate using FRSZ2 instead of low precision for compression of the Krylov basis can bring larger runtime benefits without impacting final accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmbedded Systems Design Techniques
