On Optimizing Distributed Tucker Decomposition for Dense Tensors
Venkatesan T Chakaravarthy, Jee W Choi, Douglas J Joseph, Xing Liu,, Prakash Murali, Yogish Sabharwal, Dheeraj Sreedhar

TL;DR
This paper develops an optimal distributed implementation of Tucker decomposition for dense tensors, significantly reducing computational load and communication, and achieving up to 7x speed-up over previous heuristics.
Contribution
It introduces a systematic approach to optimize distributed Tucker decomposition, outperforming heuristics in efficiency and scalability.
Findings
Up to 7x speed-up in runtime.
Significant reduction in load and communication volume.
Optimal strategies outperform heuristics systematically.
Abstract
The Tucker decomposition expresses a given tensor as the product of a small core tensor and a set of factor matrices. Apart from providing data compression, the construction is useful in performing analysis such as principal component analysis (PCA)and finds applications in diverse domains such as signal processing, computer vision and text analytics. Our objective is to develop an efficient distributed implementation for the case of dense tensors. The implementation is based on the HOOI (Higher Order Orthogonal Iterator) procedure, wherein the tensor-times-matrix product forms the core routine. Prior work have proposed heuristics for reducing the computational load and communication volume incurred by the routine. We study the two metrics in a formal and systematic manner, and design strategies that are optimal under the two fundamental metrics. Our experimental evaluation on a large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTensor decomposition and applications
