A CNN Accelerator on FPGA Using Depthwise Separable Convolution
Lin Bai, Yiming Zhao, Xinming Huang

TL;DR
This paper presents a scalable FPGA-based CNN accelerator optimized for depthwise separable convolutions, enabling high-speed image classification with significant speedup over CPU implementations.
Contribution
It introduces a scalable FPGA accelerator tailored for depthwise separable CNNs, demonstrating high performance and adaptability across different FPGA sizes.
Findings
Achieves 266.6 frames per second on ImageNet classification
Provides 20x speedup over CPU implementations
Successfully implements MobileNetV2 on Arria 10 FPGA
Abstract
Convolutional neural networks (CNNs) have been widely deployed in the fields of computer vision and pattern recognition because of their high accuracy. However, large convolution operations are computing-intensive that often requires a powerful computing platform such as Graphics Processing Unit (GPU). This makes it difficult to apply CNNs to portable devices. The state-of-the-art CNNs, such as MobileNetV2 and Xception, adopt depthwise separable convolution to replace the standard convolution for embedded platforms. That significantly reduces operations and parameters with only limited loss in accuracy. This highly structured model is very suitable for Field-Programmable Gate Array (FPGA) implementation. In this paper, a scalable high performance depthwise separable convolution optimized CNN accelerator is proposed. The accelerator can be fit into an FPGA of different sizes, provided…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Human Pose and Action Recognition
