Efficient GPU-Centered Singular Value Decomposition Using the Divide-and-Conquer Method
Shifang Liu, Huiyuan Li, Hongjiao Sheng, Haoyuan Gui, and Xiaoyu Zhang

TL;DR
This paper presents a GPU-centered SVD algorithm that eliminates CPU-GPU data transfers and optimizes GPU computations, achieving significant speedups over existing methods in heterogeneous systems.
Contribution
The paper introduces a novel GPU-based bidiagonal divide-and-conquer SVD algorithm that performs all steps on GPU, reducing data transfer bottlenecks and enhancing computational efficiency.
Findings
Achieves up to 1293.64x speedup over rocSOLVER on AMD GPUs.
Achieves up to 14.10x speedup over MAGMA on NVIDIA GPUs.
Eliminates CPU-GPU data transfers for SVD, enabling asynchronous execution.
Abstract
Singular Value Decomposition (SVD) is a fundamental matrix factorization technique in linear algebra, widely applied in numerous matrix-related problems. However, traditional SVD approaches are hindered by slow panel factorization and frequent CPU-GPU data transfers in heterogeneous systems, despite advancements in GPU computational capabilities. In this paper, we introduce a GPU-centered SVD algorithm, incorporating a novel GPU-based bidiagonal divide-and-conquer (BDC) method. We reformulate the algorithm and data layout of different steps for SVD computation, performing all panel-level computations and trailing matrix updates entirely on GPU to eliminate CPU-GPU data transfers. Furthermore, we integrate related computations to optimize BLAS utilization, thereby increasing arithmetic intensity and fully leveraging the computational capabilities of GPUs. Additionally, we introduce a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
