TL;DR
This paper introduces a GPU-optimized, parallel Hari--Zimmermann algorithm for the generalized SVD of matrix pairs, emphasizing efficiency, scalability, minimal memory use, and reproducibility across GPU clusters.
Contribution
It presents a novel, GPU-based parallel algorithm for GSVD that is scalable, memory-efficient, and guarantees bitwise reproducibility across multiple GPUs.
Findings
Achieves scalable performance on GPU clusters.
Uses minimal memory for large matrices.
Ensures reproducible results across runs.
Abstract
A parallel, blocked, one-sided Hari--Zimmermann algorithm for the generalized singular value decomposition (GSVD) of a real or a complex matrix pair is here proposed, where and have the same number of columns, and are both of the full column rank. The algorithm targets either a single graphics processing unit (GPU), or a cluster of those, performs all non-trivial computation exclusively on the GPUs, requires the minimal amount of memory to be reasonably expected, scales acceptably with the increase of the number of GPUs available, and guarantees the reproducible, bitwise identical output of the runs repeated over the same input and with the same number of GPUs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
