UNIT: Unifying Tensorized Instruction Compilation

Jian Weng; Animesh Jain; Jie Wang; Leyuan Wang; Yida Wang; and Tony; Nowatzki

arXiv:2101.08458·cs.PL·March 30, 2021

UNIT: Unifying Tensorized Instruction Compilation

Jian Weng, Animesh Jain, Jie Wang, Leyuan Wang, Yida Wang, and Tony, Nowatzki

PDF

TL;DR

This paper introduces UNIT, a unified compiler framework that simplifies and automates the utilization of tensorized instructions across different hardware platforms, significantly improving DNN inference performance.

Contribution

We develop a unified compiler framework that abstracts and automates the compilation of tensorized instructions from multiple hardware vendors, enabling easier integration and optimization.

Findings

01

Achieves 1.3x speedup over Intel oneDNN on x86 CPU

02

Achieves 1.75x speedup over Nvidia cuDNN on Nvidia GPU

03

Achieves 1.13x speedup over tuned TVM on ARM CPU

Abstract

Because of the increasing demand for computation in DNN, researchers develope both hardware and software mechanisms to reduce the compute and memory burden. A widely adopted approach is to use mixed precision data types. However, it is hard to leverage mixed precision without hardware support because of the overhead of data casting. Hardware vendors offer tensorized instructions for mixed-precision tensor operations, like Intel VNNI, Tensor Core, and ARM-DOT. These instructions involve a computing idiom that reduces multiple low precision elements into one high precision element. The lack of compilation techniques for this makes it hard to utilize these instructions: Using vendor-provided libraries for computationally-intensive kernels is inflexible and prevents further optimizations, and manually writing hardware intrinsics is error-prone and difficult for programmers. Some prior works…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.