Parallel implementation of fast randomized algorithms for the   decomposition of low rank matrices

Andrew Lucas; Mark Stalzer; John Feo

arXiv:1205.3830·cs.DC·April 2, 2014

Parallel implementation of fast randomized algorithms for the decomposition of low rank matrices

Andrew Lucas, Mark Stalzer, John Feo

PDF

TL;DR

This paper evaluates the parallel performance of randomized low-rank matrix decomposition algorithms on a supercomputer, demonstrating significant speedups and confirming error bounds on large matrices.

Contribution

It provides a detailed analysis of parallel implementation efficiency for randomized matrix decompositions on large-scale matrices using a supercomputer.

Findings

01

Performance improves significantly on non-square matrices, achieving over 70x speedup with 128 processors.

02

Error bounds from previous studies hold for matrices nearly 100 times larger than before.

03

Parallel algorithms are effectively scalable on high-performance computing systems.

Abstract

We analyze the parallel performance of randomized interpolative decomposition by decomposing low rank complex-valued Gaussian random matrices up to 64 GB. We chose a Cray XMT supercomputer as it provides an almost ideal PRAM model permitting quick investigation of parallel algorithms without obfuscation from hardware idiosyncrasies. We obtain that on non-square matrices performance becomes very good, with overall runtime over 70 times faster on 128 processors. We also verify that numerically discovered error bounds still hold on matrices nearly two orders of magnitude larger than those previously tested.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.