Minimizing Communication in Linear Algebra
Grey Ballard, James Demmel, Olga Holtz, Oded Schwartz

TL;DR
This paper generalizes communication lower bounds for a wide range of linear algebra algorithms, including dense and sparse, sequential and parallel, showing how to optimize data movement and extend bounds to related problems.
Contribution
It extends existing lower bounds to many linear algebra algorithms and provides methods to analyze and optimize communication in complex compositions.
Findings
Lower bounds apply to dense and sparse matrices, sequential and parallel algorithms.
New techniques for analyzing communication in compositions of linear algebra operations.
Examples show existing algorithms meet these lower bounds, enabling significant speedups.
Abstract
In 1981 Hong and Kung proved a lower bound on the amount of communication needed to perform dense, matrix-multiplication using the conventional algorithm, where the input matrices were too large to fit in the small, fast memory. In 2004 Irony, Toledo and Tiskin gave a new proof of this result and extended it to the parallel case. In both cases the lower bound may be expressed as (#arithmetic operations / ), where M is the size of the fast memory (or local memory in the parallel case). Here we generalize these results to a much wider variety of algorithms, including LU factorization, Cholesky factorization, factorization, QR factorization, algorithms for eigenvalues and singular values, i.e., essentially all direct methods of linear algebra. The proof works for dense or sparse matrices, and for sequential or parallel algorithms. In addition to lower…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
