An Energy-Efficient Near-Data Processing Accelerator for DNNs that Optimizes Data Accesses
Bahareh Khabbazan, Marc Riera, Antonio Gonz\'alez

TL;DR
This paper introduces QeiHaN, a near-data processing accelerator utilizing 3D-stacked memory and logarithmic activation quantization to significantly reduce memory accesses, improve speed, and save energy in DNN inference.
Contribution
QeiHaN is a novel hardware accelerator that employs a memory-centric weight storage scheme and implicit in-memory bit-shifting to optimize DNN inference efficiency.
Findings
Reduces memory accesses by 25%
Achieves 4.3x speedup over baseline
Provides 3.5x energy savings
Abstract
The constant growth of DNNs makes them challenging to implement and run efficiently on traditional compute-centric architectures. Some accelerators have attempted to add more compute units and on-chip buffers to solve the memory wall problem without much success, and sometimes even worsening the issue since more compute units also require higher memory bandwidth. Prior works have proposed the design of memory-centric architectures based on the Near-Data Processing (NDP) paradigm. NDP seeks to break the memory wall by moving the computations closer to the memory hierarchy, reducing the data movements and their cost as much as possible. The 3D-stacked memory is especially appealing for DNN accelerators due to its high-density/low-energy storage and near-memory computation capabilities to perform the DNN operations massively in parallel. However, memory accesses remain as the main…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Advanced Neural Network Applications · Parallel Computing and Optimization Techniques
