An Energy-Efficient Near-Data Processing Accelerator for DNNs that   Optimizes Data Accesses

Bahareh Khabbazan; Marc Riera; Antonio Gonz\'alez

arXiv:2310.18181·cs.AR·October 30, 2023·1 cites

An Energy-Efficient Near-Data Processing Accelerator for DNNs that Optimizes Data Accesses

Bahareh Khabbazan, Marc Riera, Antonio Gonz\'alez

PDF

Open Access

TL;DR

This paper introduces QeiHaN, a near-data processing accelerator utilizing 3D-stacked memory and logarithmic activation quantization to significantly reduce memory accesses, improve speed, and save energy in DNN inference.

Contribution

QeiHaN is a novel hardware accelerator that employs a memory-centric weight storage scheme and implicit in-memory bit-shifting to optimize DNN inference efficiency.

Findings

01

Reduces memory accesses by 25%

02

Achieves 4.3x speedup over baseline

03

Provides 3.5x energy savings

Abstract

The constant growth of DNNs makes them challenging to implement and run efficiently on traditional compute-centric architectures. Some accelerators have attempted to add more compute units and on-chip buffers to solve the memory wall problem without much success, and sometimes even worsening the issue since more compute units also require higher memory bandwidth. Prior works have proposed the design of memory-centric architectures based on the Near-Data Processing (NDP) paradigm. NDP seeks to break the memory wall by moving the computations closer to the memory hierarchy, reducing the data movements and their cost as much as possible. The 3D-stacked memory is especially appealing for DNN accelerators due to its high-density/low-energy storage and near-memory computation capabilities to perform the DNN operations massively in parallel. However, memory accesses remain as the main…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Advanced Neural Network Applications · Parallel Computing and Optimization Techniques