Efficient algorithms for computing rank-revealing factorizations on a GPU
Nathan Heavner, Chao Chen, Abinand Gopal, Per-Gunnar Martinsson

TL;DR
This paper introduces two GPU-optimized algorithms for rank-revealing factorizations that leverage randomized projections and matrix-matrix multiplications, significantly accelerating computations while maintaining accuracy.
Contribution
The paper presents novel randomized algorithms for rank-revealing factorizations that are highly efficient on GPUs, overcoming limitations of traditional methods.
Findings
Achieve an order of magnitude faster performance than GPU SVD implementations.
Maintain low-rank approximation errors comparable to the SVD.
Use randomized projections to maximize matrix-matrix operations on GPU.
Abstract
Standard rank-revealing factorizations such as the singular value decomposition and column pivoted QR factorization are challenging to implement efficiently on a GPU. A major difficulty in this regard is the inability of standard algorithms to cast most operations in terms of the Level-3 BLAS. This paper presents two alternative algorithms for computing a rank-revealing factorization of the form , where and are orthogonal and is triangular. Both algorithms use randomized projection techniques to cast most of the flops in terms of matrix-matrix multiplication, which is exceptionally efficient on the GPU. Numerical experiments illustrate that these algorithms achieve an order of magnitude acceleration over finely tuned GPU implementations of the SVD while providing low-rank approximation errors close to that of the SVD.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMatrix Theory and Algorithms · Tensor decomposition and applications
