An Efficient Batch Solver for the Singular Value Decomposition on GPUs
Ahmad Abdelfattah, Massimiliano Fasi

TL;DR
This paper introduces a high-performance GPU-based batch SVD solver that leverages the one-sided Jacobi algorithm and various optimizations, outperforming existing solutions across diverse problem types.
Contribution
The paper presents a novel GPU-oriented batch SVD solver using the one-sided Jacobi algorithm with multiple optimizations for superior performance.
Findings
Achieves significant speedups over vendor and open-source solutions.
Robust across different numerical properties, shapes, and precisions.
Outperforms existing solutions on NVIDIA and AMD systems.
Abstract
The singular value decomposition (SVD) is a powerful tool in modern numerical linear algebra, which underpins computational methods such as principal component analysis (PCA), low-rank approximations, and randomized algorithms. Many practical scenarios require solving numerous small SVD problems, a regime generally referred to as "batch SVD". Existing programming models can handle this efficiently on parallel CPU architectures, but high-performance solutions for GPUs remain immature. A GPU-oriented batch SVD solver is introduced. This solver exploits the one-sided Jacobi algorithm to exploit fine-grained parallelism, and a number of algorithmic and design optimizations achieve unmatched performance. Starting from a baseline solver, a sequence of optimizations is applied to obtain incremental performance gains. Numerical experiments show that the new solver is robust across problems with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
