Analyzing the Performance Portability of Tensor Decomposition
S. Isaac Geronimo Anderson, Keita Teranishi, Daniel M. Dunlavy, and Jee Choi

TL;DR
This paper evaluates the performance portability of tensor decomposition algorithms, identifying bottlenecks, optimizing implementations with Kokkos, and comparing against vendor libraries across CPU and GPU systems.
Contribution
It provides a detailed analysis of performance bottlenecks in tensor decomposition, demonstrates optimization techniques with Kokkos, and assesses portability across hardware platforms.
Findings
Identified matrix computation $ abla^{(n)}$ as the main bottleneck.
Achieved up to 2.25x speedup on CPU and 1.70x on GPU with tuning.
Kokkos offers performance comparable to vendor libraries for key tensor operations.
Abstract
We employ pressure point analysis and roofline modeling to identify performance bottlenecks and determine an upper bound on the performance of the Canonical Polyadic Alternating Poisson Regression Multiplicative Update (CP-APR MU) algorithm in the SparTen software library. Our analyses reveal that a particular matrix computation, , is the critical performance bottleneck in the SparTen CP-APR MU implementation. Moreover, we find that atomic operations are not a critical bottleneck while higher cache reuse can provide a non-trivial performance improvement. We also utilize grid search on the Kokkos library parallel policy parameters to achieve 2.25x average speedup over the SparTen default for computation on CPU and 1.70x on GPU. We conclude our investigations by comparing Kokkos implementations of the STREAM benchmark and the matricized tensor times Khatri-Rao…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Tensor decomposition and applications
