On Optimizing Distributed Tucker Decomposition for Sparse Tensors
Venkatesan T. Chakaravarthy, Jee W. Choi, Douglas J. Joseph, Prakash, Murali, Shivmaran S. Pandian, Yogish Sabharwal, Dheeraj Sreedhar

TL;DR
This paper introduces a lightweight, near-optimal distribution scheme for parallel Tucker decomposition of sparse tensors, significantly reducing execution time on distributed systems compared to prior methods.
Contribution
We propose a simple, efficient distribution scheme for distributed Tucker decomposition that balances performance and setup cost, outperforming complex hypergraph-based methods.
Findings
Achieves up to 3x faster HOOI execution time.
Distribution setup time is comparable to lightweight schemes.
Scheme is near-optimal on key computational metrics.
Abstract
The Tucker decomposition generalizes the notion of Singular Value Decomposition (SVD) to tensors, the higher dimensional analogues of matrices. We study the problem of constructing the Tucker decomposition of sparse tensors on distributed memory systems via the HOOI procedure, a popular iterative method. The scheme used for distributing the input tensor among the processors (MPI ranks) critically influences the HOOI execution time. Prior work has proposed different distribution schemes: an offline scheme based on sophisticated hypergraph partitioning method and simple, lightweight alternatives that can be used real-time. While the hypergraph based scheme typically results in faster HOOI execution time, being complex, the time taken for determining the distribution is an order of magnitude higher than the execution time of a single HOOI iteration. Our main contribution is a lightweight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
