Communication-optimal parallel and sequential QR and LU factorizations: theory and practice
James Demmel, Laura Grigori, Mark Hoemmen, and Julien Langou

TL;DR
This paper introduces communication-optimal algorithms for QR and LU factorizations that are efficient in both parallel and sequential settings, maintaining numerical stability comparable to traditional methods.
Contribution
It presents new algorithms, TSQR and CAQR, that minimize communication costs while preserving stability, applicable to various matrix layouts and sizes.
Findings
Algorithms are optimal up to polylogarithmic factors in communication.
TSQR is optimized for tall, skinny matrices in 1-D layouts.
CAQR extends TSQR to general rectangular matrices in 2-D layouts.
Abstract
We present parallel and sequential dense QR factorization algorithms that are both optimal (up to polylogarithmic factors) in the amount of communication they perform, and just as stable as Householder QR. Our first algorithm, Tall Skinny QR (TSQR), factors m-by-n matrices in a one-dimensional (1-D) block cyclic row layout, and is optimized for m >> n. Our second algorithm, CAQR (Communication-Avoiding QR), factors general rectangular matrices distributed in a two-dimensional block cyclic layout. It invokes TSQR for each block column factorization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicsgraph theory and CDMA systems · Interconnection Networks and Systems · Wireless Communication Networks Research
