Software for Sparse Tensor Decomposition on Emerging Computing Architectures
Eric Phipps, Tamara G. Kolda

TL;DR
This paper presents portable software for sparse tensor decomposition that achieves high performance across diverse architectures by leveraging multi-level parallelism, a new thread-local array construct, and optimized traversal strategies.
Contribution
The authors develop a portable, high-performance implementation of MTTKRP for tensor decomposition using the Kokkos framework and introduce compile-time polymorphic arrays for improved parallelism.
Findings
Performance matches architecture-specific codes
Effective across CPUs and GPUs
Reduces atomic-write contention
Abstract
In this paper, we develop software for decomposing sparse tensors that is portable to and performant on a variety of multicore, manycore, and GPU computing architectures. The result is a single code whose performance matches optimized architecture-specific implementations. The key to a portable approach is to determine multiple levels of parallelism that can be mapped in different ways to different architectures, and we explain how to do this for the matricized tensor times Khatri-Rao product (MTTKRP) which is the key kernel in canonical polyadic tensor decomposition. Our implementation leverages the Kokkos framework, which enables a single code to achieve high performance across multiple architectures that differ in how they approach fine-grained parallelism. We also introduce a new construct for portable thread-local arrays, which we call compile-time polymorphic arrays. Not only are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
