Distributed and heterogeneous tensor-vector contraction algorithms for high performance computing
Pedro J. Martinez-Ferrer, Albert-Jan Yzelman, Vicen\c{c} Beltran

TL;DR
This paper introduces distributed-memory algorithms for tensor-vector contractions and a novel distributed higher-order power method, achieving high performance and scalability on modern architectures with benefits from mixed precision arithmetic.
Contribution
It presents a distributed-memory tensor-vector contraction algorithm and a new distributed HOPM that are architecture-agnostic and highly efficient, with demonstrated scalability and performance.
Findings
Achieves 50%-80% of system memory bandwidth in performance.
Can outperform CUDA batched kernels in strong scalability scenarios.
Benefits from mixed precision arithmetic even without native low precision support.
Abstract
The tensor-vector contraction (TVC) is the most memory-bound operation of its class and a core component of the higher-order power method (HOPM). This paper brings distributed-memory parallelization to a native TVC algorithm for dense tensors that overall remains oblivious to contraction mode, tensor splitting and tensor order. Similarly, we propose a novel distributed HOPM, namely dHOPM3, that can save up to one order of magnitude of streamed memory and is about twice as costly in terms of data movement as a distributed TVC operation (dTVC) when using task-based parallelization. The numerical experiments carried out in this work on three different architectures featuring multi-core and accelerators confirm that the performances of dTVC and dHOPM3 remain relatively close to the peak system memory bandwidth (50%-80%, depending on the architecture) and on par with STREAM benchmark…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Computational Physics and Python Applications · Physics of Superconductivity and Magnetism
