Sparse MTTKRP Acceleration for Tensor Decomposition on GPU
Sasindu Wijeratne, Rajgopal Kannan, Viktor Prasanna

TL;DR
This paper presents a GPU-optimized algorithm for sparse tensor decomposition that significantly accelerates spMTTKRP computations, supporting tensors with many modes and outperforming existing methods.
Contribution
The work introduces a novel GPU algorithm that eliminates atomic operations, reduces memory communication, and supports dynamic tensor remapping for all tensor modes.
Findings
Achieves up to 21.7x speedup over state-of-the-art implementations.
Supports tensors with more than 4 modes, unlike previous methods.
Provides a geometric mean speedup of 1.5x to 2.0x across datasets.
Abstract
Sparse Matricized Tensor Times Khatri-Rao Product (spMTTKRP) is the bottleneck kernel of sparse tensor decomposition. In this work, we propose a GPU-based algorithm design to address the key challenges in accelerating spMTTKRP computation, including (1) eliminating global atomic operations across GPU thread blocks, (2) avoiding the intermediate values being communicated between GPU thread blocks and GPU global memory, and (3) ensuring a balanced distribution of workloads across GPU thread blocks. Our approach also supports dynamic tensor remapping, enabling the above optimizations in all the modes of the input tensor. Our approach achieves a geometric mean speedup of 1.5x, 2.0x, and 21.7x in total execution time across widely used datasets compared with the state-of-the-art GPU implementations. Our work is the only GPU implementation that can support tensors with modes greater than 4…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
