Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning & HPC Workloads
Evangelos Georganas, Dhiraj Kalamkar, Sasikanth Avancha, Menachem, Adelman, Deepti Aggarwal, Cristina Anderson, Alexander Breuer, Jeremy, Bruestle, Narendra Chaudhary, Abhisek Kundu, Denise Kutnick, Frank Laub,, Vasimuddin Md, Sanchit Misra, Ramanarayan Mohanty, Hans Pabst

TL;DR
This paper introduces Tensor Processing Primitives (TPP), a versatile programming abstraction that enhances efficiency and portability in deep learning and high-performance computing workloads by using a compact set of tensor operators.
Contribution
The work presents TPP as a platform-agnostic yet highly optimized set of tensor operators, enabling portable and efficient implementation of complex DL workloads.
Findings
TPP-based kernels outperform state-of-the-art implementations.
End-to-end DL and HPC workloads using TPP achieve higher performance.
TPP provides a platform-agnostic programming model for tensor computations.
Abstract
During the past decade, novel Deep Learning (DL) algorithms, workloads and hardware have been developed to tackle a wide range of problems. Despite the advances in workload and hardware ecosystems, the programming methodology of DL systems is stagnant. DL workloads leverage either highly-optimized, yet platform-specific and inflexible kernels from DL libraries, or in the case of novel operators, reference implementations are built via DL framework primitives with underwhelming performance. This work introduces the Tensor Processing Primitives (TPP), a programming abstraction striving for efficient, portable implementation of DL workloads with high-productivity. TPPs define a compact, yet versatile set of 2D-tensor operators (or a virtual Tensor ISA), which subsequently can be utilized as building-blocks to construct complex operators on high-dimensional tensors. The TPP specification is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
