Communication-optimal Parallel and Sequential Cholesky Decomposition
Grey Ballard, James Demmel, Olga Holtz, Oded Schwartz

TL;DR
This paper extends communication cost lower bounds to Cholesky decomposition and identifies algorithms that achieve these bounds, optimizing data movement in both sequential and parallel dense linear algebra computations.
Contribution
It generalizes known communication lower bounds to Cholesky factorization and provides communication-optimal algorithms for various memory hierarchies and parallel systems.
Findings
Derived lower bounds for communication in Cholesky decomposition.
Compared algorithms to bounds and identified optimal implementations.
Optimized communication for multiple memory hierarchy levels.
Abstract
Numerical algorithms have two kinds of costs: arithmetic and communication, by which we mean either moving data between levels of a memory hierarchy (in the sequential case) or over a network connecting processors (in the parallel case). Communication costs often dominate arithmetic costs, so it is of interest to design algorithms minimizing communication. In this paper we first extend known lower bounds on the communication cost (both for bandwidth and for latency) of conventional (O(n^3)) matrix multiplication to Cholesky factorization, which is used for solving dense symmetric positive definite linear systems. Second, we compare the costs of various Cholesky decomposition implementations to these lower bounds and identify the algorithms and data structures that attain them. In the sequential case, we consider both the two-level and hierarchical memory models. Combined with prior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
