A Compilation Flow for the Generation of CNN Inference Accelerators on FPGAs
Seung-Hun Chung, Tarek S. Abdelrahman

TL;DR
This paper introduces a compilation flow that generates optimized CNN inference accelerators on FPGAs using TVM and OpenCL, achieving significant performance improvements and flexibility over existing high-level synthesis methods.
Contribution
It presents an automated optimization process within TVM for FPGA-based CNN accelerators, enhancing performance and resource efficiency compared to unoptimized and some existing approaches.
Findings
Performance improved by up to 846X over base accelerators
Accelerators outperform CPU and single-threaded TVM by up to 4.57X and 3.83X respectively
Achieves near-TVM multi-threaded performance with significantly less complexity
Abstract
We present a compilation flow for the generation of CNN inference accelerators on FPGAs. The flow translates a frozen model into OpenCL kernels with the TVM compiler and uses the Intel OpenCL SDK to compile to an FPGA bitstream. We improve the quality of the generated hardware with optimizations applied to the base OpenCL kernels generated by TVM. These optimizations increase parallelism, reduce memory access latency, increase concurrency and save on-chip resources. We automate these optimizations in TVM and evaluate them by generating accelerators for LeNet-5, MobileNetV1 and ResNet-34 on an Intel Stratix~10SX. We show that the optimizations improve the performance of the generated accelerators by up to 846X over the base accelerators. The performance of the optimized accelerators is up to 4.57X better than TensorFlow on CPU, 3.83X better than single-threaded TVM and is only 0.34X…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Embedded Systems Design Techniques
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Pointwise Convolution · Average Pooling · Depthwise Convolution · Depthwise Separable Convolution · Convolution · Global Average Pooling · Balanced Selection · 1x1 Convolution · Dense Connections
