Vyasa: A High-Performance Vectorizing Compiler for Tensor Convolutions   on the Xilinx AI Engine

Prasanth Chatarasi; Stephen Neuendorffer; Samuel Bayliss; Kees; Vissers; Vivek Sarkar

arXiv:2006.01331·cs.DC·June 3, 2020

Vyasa: A High-Performance Vectorizing Compiler for Tensor Convolutions on the Xilinx AI Engine

Prasanth Chatarasi, Stephen Neuendorffer, Samuel Bayliss, Kees, Vissers, Vivek Sarkar

PDF

TL;DR

Vyasa is a compiler extension that automatically generates efficient code for tensor convolutions on Xilinx's AI Engine, significantly simplifying programming and improving performance compared to expert-written code.

Contribution

It introduces Vyasa, a novel compiler system extending Halide to automatically optimize tensor convolutions for the Xilinx AI Engine.

Findings

01

Achieved 7.6 and 23.3 MACs/cycle for 32-bit and 16-bit operands.

02

Demonstrated 1.10x performance improvement over expert code on some workloads.

03

Produced code 50x smaller than expert-written implementations.

Abstract

Xilinx's AI Engine is a recent industry example of energy-efficient vector processing that includes novel support for 2D SIMD datapaths and shuffle interconnection network. The current approach to programming the AI Engine relies on a C/C++ API for vector intrinsics. While an advance over assembly-level programming, it requires the programmer to specify a number of low-level operations based on detailed knowledge of the hardware. To address these challenges, we introduce Vyasa, a new programming system that extends the Halide DSL compiler to automatically generate code for the AI Engine. We evaluated Vyasa on 36 CONV2D and 6 CONV3D workloads, and achieved geometric means of 7.6 and 23.3 MACs/cycle for 32-bit and 16-bit operands (which represent 95.9% and 72.8% of the peak performance respectively). For 4 of these workloads for which expert-written codes were available to us, Vyasa…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.