PIMfused: Near-Bank DRAM-PIM with Fused-layer Dataflow for CNN Data Transfer Optimization

Simei Yang; Xinyu Shi; Lu Zhao; Yunyu Ling; Quanjun Wang; Francky Catthoor

arXiv:2511.07985·cs.AR·November 12, 2025

PIMfused: Near-Bank DRAM-PIM with Fused-layer Dataflow for CNN Data Transfer Optimization

Simei Yang, Xinyu Shi, Lu Zhao, Yunyu Ling, Quanjun Wang, Francky Catthoor

PDF

Open Access

TL;DR

PIMfused introduces a fused-layer dataflow in near-bank DRAM-PIM architectures to enhance CNN acceleration by reducing cross-bank data transfers, improving performance, energy efficiency, and area.

Contribution

It proposes a novel hardware-software co-design with fused-layer dataflow that breaks inter-bank dependencies for CNN processing in DRAM-PIM systems.

Findings

01

Achieves 69.4% reduction in memory cycles with 4-bank PIMcores.

02

Reduces energy consumption to 83.4% of baseline.

03

Cuts area to 76.5% of baseline.

Abstract

Near-bank Processing-in-Memory (PIM) architectures integrate processing cores (PIMcores) close to DRAM banks to mitigate the high cost of off-chip memory accesses. When accelerating convolutional neural network (CNN) on DRAM-PIM, performance is often constrained by cross-bank (or cross-PIMcore) data transfers, which are induced by the conventional layer-by-layer dataflow that enforces inter-bank (or inter-PIMcore) dependencies across successive CNN layers. To address this challenge, we propose PIMfused, a hardware-software co-design that enables fused-layer dataflow for end-to-end CNN execution in near-bank DRAM-PIM. By adopting fused-layer dataflow, PIMfused improves data reuse and, more importantly, breaks inter-bank data dependencies, thereby optimizing cross-bank data transfers without sacrificing bank-level parallelism. We study the impact of buffer sizes and PIMcore parallelism…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Advanced Memory and Neural Computing