Parallel implementation of fast randomized algorithms for the decomposition of low rank matrices
Andrew Lucas, Mark Stalzer, John Feo

TL;DR
This paper evaluates the parallel performance of randomized low-rank matrix decomposition algorithms on a supercomputer, demonstrating significant speedups and confirming error bounds on large matrices.
Contribution
It provides a detailed analysis of parallel implementation efficiency for randomized matrix decompositions on large-scale matrices using a supercomputer.
Findings
Performance improves significantly on non-square matrices, achieving over 70x speedup with 128 processors.
Error bounds from previous studies hold for matrices nearly 100 times larger than before.
Parallel algorithms are effectively scalable on high-performance computing systems.
Abstract
We analyze the parallel performance of randomized interpolative decomposition by decomposing low rank complex-valued Gaussian random matrices up to 64 GB. We chose a Cray XMT supercomputer as it provides an almost ideal PRAM model permitting quick investigation of parallel algorithms without obfuscation from hardware idiosyncrasies. We obtain that on non-square matrices performance becomes very good, with overall runtime over 70 times faster on 128 processors. We also verify that numerically discovered error bounds still hold on matrices nearly two orders of magnitude larger than those previously tested.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
