Communication-optimal parallel and sequential QR and LU factorizations
James Demmel, Laura Grigori, Mark Hoemmen, and Julien Langou

TL;DR
This paper introduces communication-optimal algorithms for QR and LU factorizations that match theoretical lower bounds, outperforming existing LAPACK and ScaLAPACK methods in communication efficiency while maintaining numerical stability.
Contribution
It extends known lower bounds to QR and LU factorizations and develops algorithms that achieve these bounds up to polylogarithmic factors, improving upon existing methods.
Findings
QR algorithms attain near-optimal communication bounds
Existing LAPACK and ScaLAPACK algorithms perform more communication
LU algorithms in literature also approach these bounds
Abstract
We present parallel and sequential dense QR factorization algorithms that are both optimal (up to polylogarithmic factors) in the amount of communication they perform, and just as stable as Householder QR. We prove optimality by extending known lower bounds on communication bandwidth for sequential and parallel matrix multiplication to provide latency lower bounds, and show these bounds apply to the LU and QR decompositions. We not only show that our QR algorithms attain these lower bounds (up to polylogarithmic factors), but that existing LAPACK and ScaLAPACK algorithms perform asymptotically more communication. We also point out recent LU algorithms in the literature that attain at least some of these lower bounds.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInterconnection Networks and Systems · Stochastic Gradient Optimization Techniques · Parallel Computing and Optimization Techniques
