TL;DR
This paper introduces GETT, a high-performance tensor contraction method inspired by GEMM, achieving significant speedups and efficiency improvements by optimizing cache usage and vectorization without auxiliary memory.
Contribution
GETT systematically reduces tensor contractions to optimized GEMM-like kernels, enabling high performance and cache efficiency without auxiliary memory, outperforming existing methods.
Findings
GETT outperforms existing approaches by up to 12.4x in bandwidth-bound cases.
GETT achieves up to 1.41x speedup over equivalent GEMM for certain tensor contractions.
GETT reaches up to 91.3% of peak floating-point performance.
Abstract
We present "GEMM-like Tensor-Tensor multiplication" (GETT), a novel approach to tensor contractions that mirrors the design of a high-performance general matrix-matrix multiplication (GEMM). The critical insight behind GETT is the identification of three index sets, involved in the tensor contraction, which enable us to systematically reduce an arbitrary tensor contraction to loops around a highly tuned "macro-kernel". This macro-kernel operates on suitably prepared ("packed") sub-tensors that reside in a specified level of the cache hierarchy. In contrast to previous approaches to tensor contractions, GETT exhibits desirable features such as unit-stride memory accesses, cache-awareness, as well as full vectorization, without requiring auxiliary memory. To compare our technique with other modern tensor contractions, we integrate GETT alongside the so called…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
