Design of a high-performance GEMM-like Tensor-Tensor Multiplication

Paul Springer; Paolo Bientinesi

arXiv:1607.00145·cs.MS·November 8, 2017

Design of a high-performance GEMM-like Tensor-Tensor Multiplication

Paul Springer, Paolo Bientinesi

PDF

4 Repos

TL;DR

This paper introduces GETT, a high-performance tensor contraction method inspired by GEMM, achieving significant speedups and efficiency improvements by optimizing cache usage and vectorization without auxiliary memory.

Contribution

GETT systematically reduces tensor contractions to optimized GEMM-like kernels, enabling high performance and cache efficiency without auxiliary memory, outperforming existing methods.

Findings

01

GETT outperforms existing approaches by up to 12.4x in bandwidth-bound cases.

02

GETT achieves up to 1.41x speedup over equivalent GEMM for certain tensor contractions.

03

GETT reaches up to 91.3% of peak floating-point performance.

Abstract

We present "GEMM-like Tensor-Tensor multiplication" (GETT), a novel approach to tensor contractions that mirrors the design of a high-performance general matrix-matrix multiplication (GEMM). The critical insight behind GETT is the identification of three index sets, involved in the tensor contraction, which enable us to systematically reduce an arbitrary tensor contraction to loops around a highly tuned "macro-kernel". This macro-kernel operates on suitably prepared ("packed") sub-tensors that reside in a specified level of the cache hierarchy. In contrast to previous approaches to tensor contractions, GETT exhibits desirable features such as unit-stride memory accesses, cache-awareness, as well as full vectorization, without requiring auxiliary memory. To compare our technique with other modern tensor contractions, we integrate GETT alongside the so called…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.