Res-DPU: Resource-shared Digital Processing-in-memory Unit for Edge-AI Workloads

Mukul Lokhande; and Narendra Singh Dhakad; and Seema Chouhan; and Akash Sankhe; and Santosh Kumar Vishvakarma

arXiv:2510.19260·cs.AR·October 23, 2025

Res-DPU: Resource-shared Digital Processing-in-memory Unit for Edge-AI Workloads

Mukul Lokhande, and Narendra Singh Dhakad, and Seema Chouhan, and Akash Sankhe, and Santosh Kumar Vishvakarma

PDF

Open Access

TL;DR

Res-DPU introduces a resource-shared digital PIM unit with reduced transistor count and power consumption, significantly improving energy efficiency and scalability for edge AI workloads.

Contribution

It proposes a novel resource-shared digital PIM architecture with reduced transistor count, power consumption, and a new approximate multiplication method for edge AI acceleration.

Findings

01

Achieves 0.43 TOPS throughput and 87.22 TOPS/W energy efficiency.

02

Reduces transistor count by up to 56% and power consumption by 21.35%.

03

Maintains 96.85% QoR on CNN models with 30% pruning.

Abstract

Processing-in-memory (PIM) has emerged as the go to solution for addressing the von Neumann bottleneck in edge AI accelerators. However, state-of-the-art (SoTA) digital PIM approaches suffer from low compute density, primarily due to the use of bulky bit cells and transistor-heavy adder trees, which impose limitations on macro scalability and energy efficiency. This work introduces Res-DPU, a resource-shared digital PIM unit, with a dual-port 5T SRAM latch and shared 2T AND compute logic. This reflects the per-bit multiplication cost to just 5.25T and reduced the transistor count of the PIM array by up to 56% over the SoTA works. Furthermore, a Transistor-Reduced 2D Interspersed Adder Tree (TRAIT) with FA-7T and PG-FA-26T helps reduce the power consumption of the adder tree by up to 21.35% and leads to improved energy efficiency by 59% compared to conventional 28T RCA designs. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices · Parallel Computing and Optimization Techniques