Energy-efficient Dense DNN Acceleration with Signed Bit-slice   Architecture

Dongseok Im; Gwangtae Park; Zhiyong Li; Junha Ryu; and Hoi-Jun Yoo

arXiv:2203.07679·cs.AR·March 16, 2022

Energy-efficient Dense DNN Acceleration with Signed Bit-slice Architecture

Dongseok Im, Gwangtae Park, Zhiyong Li, Junha Ryu, and Hoi-Jun Yoo

PDF

Open Access

TL;DR

This paper introduces a signed bit-slice architecture that efficiently accelerates high-precision dense DNNs on mobile SoCs by exploiting zero values and balancing data, achieving significant improvements in area, energy, and throughput.

Contribution

It proposes a novel signed bit-slice representation and architecture that accelerates dense DNNs by exploiting zero slices and output speculation, outperforming previous accelerators.

Findings

01

3.65x higher area-efficiency compared to Bit-fusion

02

3.88x higher energy-efficiency

03

5.35x higher throughput

Abstract

As the number of deep neural networks (DNNs) to be executed on a mobile system-on-chip (SoC) increases, the mobile SoC suffers from the real-time DNN acceleration within its limited hardware resources and power budget. Although the previous mobile neural processing units (NPUs) take advantage of low-bit computing and exploitation of the sparsity, it is incapable of accelerating high-precision and dense DNNs. This paper proposes energy-efficient signed bit-slice architecture which accelerates both high-precision and dense DNNs by exploiting a large number of zero values of signed bit-slices. Proposed signed bit-slice representation (SBR) changes signed $111 1_{2}$ bit-slice to $000 0_{2}$ by borrowing a $1$ value from its lower order of bit-slice. As a result, it generates a large number of zero bit-slices even in dense DNNs. Moreover, it balances the positive and negative values of 2's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Parallel Computing and Optimization Techniques