Implementation of QR factorization of tall and very skinny matrices on current GPUs
Jonas Thies, Melven R\"ohrig-Z\"ollner

TL;DR
This paper explores efficient GPU implementations of QR decomposition for tall, skinny matrices, comparing methods like Gramian-based algorithms and TSQR, emphasizing optimization techniques to overcome memory bandwidth limitations.
Contribution
It introduces optimized GPU algorithms for QR factorization of tall-skinny matrices, including Q-less QR and shared memory exploitation, and compares their performance and complexity.
Findings
TSQR is competitive in time-to-solution.
Memory-bound regimes require specialized, optimized implementations.
Q-less QR reduces memory write-backs and improves performance.
Abstract
We consider the problem of computing a QR (or QZ) decomposition of a real, dense, tall and very skinny matrix. That is, the number of columns is tiny compared to the number of rows, rendering most computations completely or partially memory-bandwidth limited. The paper focuses on recent NVIDIA GPGPUs still supporting 64-bit floating-point arithmetic, but the findings carry over to AMD GPUs as well. We discuss two basic algorithms: Methods based on the normal equations (Gram matrix), in particular Cholesky-QR2 and SVQB, and the "tall-skinny QR" (TSQR), based on Householder transformations in a tree-reduction scheme. We propose two primary optimization techniques: Avoiding the write-back of the Q factor ("Q-less QR"), and exploiting fast local memory (shared memory on GPUs). We compare a straight-forward implementation of Gramian-based methods, and a more sophisticated TSQR…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMatrix Theory and Algorithms · Numerical Methods and Algorithms · Parallel Computing and Optimization Techniques
