A 3D Parallel Algorithm for QR Decomposition
Grey Ballard, James Demmel, Laura Grigori, Mathias Jacquelin, Nicholas, Knight

TL;DR
This paper introduces a 3D parallel algorithm for QR decomposition that optimizes communication costs by balancing bandwidth and latency, adaptable to different machine architectures.
Contribution
It presents a novel parallel QR decomposition algorithm with tunable communication tradeoffs, improving efficiency over existing methods.
Findings
Reduces communication volume compared to traditional algorithms.
Offers a tunable parameter to balance bandwidth and latency.
Applicable to various parallel computing architectures.
Abstract
Interprocessor communication often dominates the runtime of large matrix computations. We present a parallel algorithm for computing QR decompositions whose bandwidth cost (communication volume) can be decreased at the cost of increasing its latency cost (number of messages). By varying a parameter to navigate the bandwidth/latency tradeoff, we can tune this algorithm for machines with different communication costs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Interconnection Networks and Systems · Embedded Systems Design Techniques
