A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things
Li Du, Yuan Du, Yilei Li, Mau-Chung Frank Chang

TL;DR
This paper presents a reconfigurable streaming CNN accelerator optimized for IoT devices, achieving high throughput and energy efficiency through innovative techniques like filter decomposition and parallel max pooling.
Contribution
The paper introduces a novel reconfigurable streaming CNN accelerator supporting arbitrary convolution sizes with improved energy efficiency for IoT applications.
Findings
Supports major CNNs with 152GOPS peak throughput
Achieves 434GOPS/W energy efficiency at 350mW
Supports parallel max pooling for throughput enhancement
Abstract
Convolutional neural network (CNN) offers significant accuracy in image detection. To implement image detection using CNN in the internet of things (IoT) devices, a streaming hardware accelerator is proposed. The proposed accelerator optimizes the energy efficiency by avoiding unnecessary data movement. With unique filter decomposition technique, the accelerator can support arbitrary convolution window size. In addition, max pooling function can be computed in parallel with convolution by using separate pooling unit, thus achieving throughput improvement. A prototype accelerator was implemented in TSMC 65nm technology with a core size of 5mm2. The accelerator can support major CNNs and achieve 152GOPS peak throughput and 434GOPS/W energy efficiency at 350mW, making it a promising hardware accelerator for intelligent IoT devices.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · CCD and CMOS Imaging Sensors · Advanced Memory and Neural Computing
MethodsMax Pooling · Convolution
