Optimizing Distributed Tensor Contractions using Node-Aware Processor Grids
Andreas Irmler, Raghavendra Kanakagiri, Sebastian T. Ohlmann and, Edgar Solomonik, Andreas Gr\"uneis

TL;DR
This paper introduces a node-aware communication algorithm for distributed tensor contractions that reduces inter-node communication, improving performance in quantum chemistry and matrix multiplication on large-scale multi-core systems.
Contribution
It presents a novel node-aware processor grid algorithm integrated into CTF, enhancing distributed tensor contraction efficiency and outperforming existing libraries in large-scale computations.
Findings
Significant performance improvements in tensor contractions on hundreds of nodes.
Better efficiency compared to COSMA and ScaLAPACK in matrix multiplication.
Enhanced quantum chemistry calculations with reduced communication overhead.
Abstract
We propose an algorithm that aims at minimizing the inter-node communication volume for distributed and memory-efficient tensor contraction schemes on modern multi-core compute nodes. The key idea is to define processor grids that optimize intra-/inter-node communication volume in the employed contraction algorithms. We present an implementation of the proposed node-aware communication algorithm into the Cyclops Tensor Framework (CTF). We demonstrate that this implementation achieves a significantly improved performance for matrix-matrix-multiplication and tensor-contractions on up to several hundreds modern compute nodes compared to conventional implementations without using node-aware processor grids. Our implementation shows good performance when compared with existing state-of-the-art parallel matrix multiplication libraries (COSMA and ScaLAPACK). In addition to the discussion of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Tensor decomposition and applications
