A Flexible Precision Scaling Deep Neural Network Accelerator with Efficient Weight Combination
Liang Zhao, Kunming Shao, Fengshi Tian, Tim Kwang-Ting Cheng, Chi-Ying, Tsui, Yi Zou

TL;DR
This paper presents a flexible deep neural network accelerator supporting mixed-precision inference from 2 to 8 bits, optimizing hardware utilization and energy efficiency for edge devices.
Contribution
It introduces a novel hardware architecture with weight decomposition, bit-serial MAC operations, and an efficient carry save adder tree for continuous precision scaling.
Findings
Achieves 4.09 TOPS peak throughput at 2-bit precision.
Attains 68.94 TOPS/W energy efficiency.
Supports flexible mixed-precision neural network inference.
Abstract
Deploying mixed-precision neural networks on edge devices is friendly to hardware resources and power consumption. To support fully mixed-precision neural network inference, it is necessary to design flexible hardware accelerators for continuous varying precision operations. However, the previous works have issues on hardware utilization and overhead of reconfigurable logic. In this paper, we propose an efficient accelerator for 2~8-bit precision scaling with serial activation input and parallel weight preloaded. First, we set two loading modes for the weight operands and decompose the weight into the corresponding bitwidths, which extends the weight precision support efficiently. Then, to improve hardware utilization of low-precision operations, we design the architecture that performs bit-serial MAC operation with systolic dataflow, and the partial sums are combined spatially.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing Techniques and Applications
