Communication Lower Bounds for Matricized Tensor Times Khatri-Rao Product
Grey Ballard, Nicholas Knight, Kathryn Rouse

TL;DR
This paper establishes fundamental communication lower bounds for the matricized-tensor times Khatri-Rao product, and presents optimal algorithms that minimize data movement in tensor CP decomposition computations.
Contribution
It provides the first known communication lower bounds for this tensor operation and introduces algorithms that achieve these bounds, improving efficiency over naive methods.
Findings
Communication lower bounds are established for dense tensor computations.
Optimal algorithms are developed that match these lower bounds.
The structure of the tensor operation allows for reduced communication compared to naive approaches.
Abstract
The matricized-tensor times Khatri-Rao product computation is the typical bottleneck in algorithms for computing a CP decomposition of a tensor. In order to develop high performance sequential and parallel algorithms, we establish communication lower bounds that identify how much data movement is required for this computation in the case of dense tensors. We also present sequential and parallel algorithms that attain the lower bounds and are therefore communication optimal. In particular, we show that the structure of the computation allows for less communication than the straightforward approach of casting the computation as a matrix multiplication operation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
