A Compilation Flow for the Generation of CNN Inference Accelerators on   FPGAs

Seung-Hun Chung; Tarek S. Abdelrahman

arXiv:2203.04015·cs.DC·March 9, 2022·1 cites

A Compilation Flow for the Generation of CNN Inference Accelerators on FPGAs

Seung-Hun Chung, Tarek S. Abdelrahman

PDF

Open Access

TL;DR

This paper introduces a compilation flow that generates optimized CNN inference accelerators on FPGAs using TVM and OpenCL, achieving significant performance improvements and flexibility over existing high-level synthesis methods.

Contribution

It presents an automated optimization process within TVM for FPGA-based CNN accelerators, enhancing performance and resource efficiency compared to unoptimized and some existing approaches.

Findings

01

Performance improved by up to 846X over base accelerators

02

Accelerators outperform CPU and single-threaded TVM by up to 4.57X and 3.83X respectively

03

Achieves near-TVM multi-threaded performance with significantly less complexity

Abstract

We present a compilation flow for the generation of CNN inference accelerators on FPGAs. The flow translates a frozen model into OpenCL kernels with the TVM compiler and uses the Intel OpenCL SDK to compile to an FPGA bitstream. We improve the quality of the generated hardware with optimizations applied to the base OpenCL kernels generated by TVM. These optimizations increase parallelism, reduce memory access latency, increase concurrency and save on-chip resources. We automate these optimizations in TVM and evaluate them by generating accelerators for LeNet-5, MobileNetV1 and ResNet-34 on an Intel Stratix~10SX. We show that the optimizations improve the performance of the generated accelerators by up to 846X over the base accelerators. The performance of the optimized accelerators is up to 4.57X better than TensorFlow on CPU, 3.83X better than single-threaded TVM and is only 0.34X…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Embedded Systems Design Techniques

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Pointwise Convolution · Average Pooling · Depthwise Convolution · Depthwise Separable Convolution · Convolution · Global Average Pooling · Balanced Selection · 1x1 Convolution · Dense Connections