eCNN: A Block-Based and Highly-Parallel CNN Accelerator for Edge   Inference

Chao-Tsung Huang; Yu-Chun Ding; Huan-Ching Wang; Chi-Wen Weng,; Kai-Ping Lin; Li-Wei Wang; Li-De Chen

arXiv:1910.05680·cs.DC·October 15, 2019

eCNN: A Block-Based and Highly-Parallel CNN Accelerator for Edge Inference

Chao-Tsung Huang, Yu-Chun Ding, Huan-Ching Wang, Chi-Wen Weng,, Kai-Ping Lin, Li-Wei Wang, Li-De Chen

PDF

TL;DR

This paper introduces eCNN, a highly-parallel, block-based CNN accelerator optimized for edge inference, capable of supporting ultra-high-resolution videos efficiently with reduced power and memory usage.

Contribution

It proposes a novel block-based inference flow, a hardware-oriented network model ERNet, and a coarse-grained instruction set FBISA, integrated into an embedded processor eCNN for efficient edge CNN inference.

Findings

01

Supports 4K Ultra-HD 30 fps with low power consumption

02

Eliminates DRAM bandwidth for feature maps using block-based inference

03

Outperforms state-of-the-art in power efficiency and resolution support

Abstract

Convolutional neural networks (CNNs) have recently demonstrated superior quality for computational imaging applications. Therefore, they have great potential to revolutionize the image pipelines on cameras and displays. However, it is difficult for conventional CNN accelerators to support ultra-high-resolution videos at the edge due to their considerable DRAM bandwidth and power consumption. Therefore, finding a further memory- and computation-efficient microarchitecture is crucial to speed up this coming revolution. In this paper, we approach this goal by considering the inference flow, network model, instruction set, and processor design jointly to optimize hardware performance and image quality. We apply a block-based inference flow which can eliminate all the DRAM bandwidth for feature maps and accordingly propose a hardware-oriented network model, ERNet, to optimize image quality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Convolution