A Flexible Precision Scaling Deep Neural Network Accelerator with   Efficient Weight Combination

Liang Zhao; Kunming Shao; Fengshi Tian; Tim Kwang-Ting Cheng; Chi-Ying; Tsui; Yi Zou

arXiv:2502.00687·cs.AR·February 4, 2025

A Flexible Precision Scaling Deep Neural Network Accelerator with Efficient Weight Combination

Liang Zhao, Kunming Shao, Fengshi Tian, Tim Kwang-Ting Cheng, Chi-Ying, Tsui, Yi Zou

PDF

Open Access

TL;DR

This paper presents a flexible deep neural network accelerator supporting mixed-precision inference from 2 to 8 bits, optimizing hardware utilization and energy efficiency for edge devices.

Contribution

It introduces a novel hardware architecture with weight decomposition, bit-serial MAC operations, and an efficient carry save adder tree for continuous precision scaling.

Findings

01

Achieves 4.09 TOPS peak throughput at 2-bit precision.

02

Attains 68.94 TOPS/W energy efficiency.

03

Supports flexible mixed-precision neural network inference.

Abstract

Deploying mixed-precision neural networks on edge devices is friendly to hardware resources and power consumption. To support fully mixed-precision neural network inference, it is necessary to design flexible hardware accelerators for continuous varying precision operations. However, the previous works have issues on hardware utilization and overhead of reconfigurable logic. In this paper, we propose an efficient accelerator for 2~8-bit precision scaling with serial activation input and parallel weight preloaded. First, we set two loading modes for the weight operands and decompose the weight into the corresponding bitwidths, which extends the weight precision support efficiently. Then, to improve hardware utilization of low-precision operations, we design the architecture that performs bit-serial MAC operation with systolic dataflow, and the partial sums are combined spatially.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing Techniques and Applications