HPTT: A High-Performance Tensor Transposition C++ Library

Paul Springer; Tong Su; Paolo Bientinesi

arXiv:1704.04374·cs.MS·May 12, 2017

HPTT: A High-Performance Tensor Transposition C++ Library

Paul Springer, Tong Su, Paolo Bientinesi

PDF

3 Repos

TL;DR

HPTT is an open-source C++ library that efficiently performs tensor transpositions at runtime, with optimizations and autotuning, significantly improving tensor contraction performance across architectures.

Contribution

Introduces HPTT, a modular, architecture-portable tensor transposition library with autotuning, enabling high-performance tensor operations in runtime applications.

Findings

01

Achieves bandwidth comparable to SAXPY.

02

Yields up to 3.1x speedup in tensor contractions.

03

Performs well across diverse architectures.

Abstract

Recently we presented TTC, a domain-specific compiler for tensor transpositions. Despite the fact that the performance of the generated code is nearly optimal, due to its offline nature, TTC cannot be utilized in all the application codes in which the tensor sizes and the necessary tensor permutations are determined at runtime. To overcome this limitation, we introduce the open-source C++ library High-Performance Tensor Transposition (HPTT). Similar to TTC, HPTT incorporates optimizations such as blocking, multi-threading, and explicit vectorization; furthermore it decomposes any transposition into multiple loops around a so called micro-kernel. This modular design---inspired by BLIS---makes HPTT easy to port to different architectures, by only replacing the hand-vectorized micro-kernel (e.g., a 4x4 transpose). HPTT also offers an optional autotuning framework---guided by a performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

See pages - of hptt_array17.pdf