Res-DPU: Resource-shared Digital Processing-in-memory Unit for Edge-AI Workloads
Mukul Lokhande, and Narendra Singh Dhakad, and Seema Chouhan, and Akash Sankhe, and Santosh Kumar Vishvakarma

TL;DR
Res-DPU introduces a resource-shared digital PIM unit with reduced transistor count and power consumption, significantly improving energy efficiency and scalability for edge AI workloads.
Contribution
It proposes a novel resource-shared digital PIM architecture with reduced transistor count, power consumption, and a new approximate multiplication method for edge AI acceleration.
Findings
Achieves 0.43 TOPS throughput and 87.22 TOPS/W energy efficiency.
Reduces transistor count by up to 56% and power consumption by 21.35%.
Maintains 96.85% QoR on CNN models with 30% pruning.
Abstract
Processing-in-memory (PIM) has emerged as the go to solution for addressing the von Neumann bottleneck in edge AI accelerators. However, state-of-the-art (SoTA) digital PIM approaches suffer from low compute density, primarily due to the use of bulky bit cells and transistor-heavy adder trees, which impose limitations on macro scalability and energy efficiency. This work introduces Res-DPU, a resource-shared digital PIM unit, with a dual-port 5T SRAM latch and shared 2T AND compute logic. This reflects the per-bit multiplication cost to just 5.25T and reduced the transistor count of the PIM array by up to 56% over the SoTA works. Furthermore, a Transistor-Reduced 2D Interspersed Adder Tree (TRAIT) with FA-7T and PG-FA-26T helps reduce the power consumption of the adder tree by up to 21.35% and leads to improved energy efficiency by 59% compared to conventional 28T RCA designs. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices · Parallel Computing and Optimization Techniques
