PIMfused: Near-Bank DRAM-PIM with Fused-layer Dataflow for CNN Data Transfer Optimization
Simei Yang, Xinyu Shi, Lu Zhao, Yunyu Ling, Quanjun Wang, Francky Catthoor

TL;DR
PIMfused introduces a fused-layer dataflow in near-bank DRAM-PIM architectures to enhance CNN acceleration by reducing cross-bank data transfers, improving performance, energy efficiency, and area.
Contribution
It proposes a novel hardware-software co-design with fused-layer dataflow that breaks inter-bank dependencies for CNN processing in DRAM-PIM systems.
Findings
Achieves 69.4% reduction in memory cycles with 4-bank PIMcores.
Reduces energy consumption to 83.4% of baseline.
Cuts area to 76.5% of baseline.
Abstract
Near-bank Processing-in-Memory (PIM) architectures integrate processing cores (PIMcores) close to DRAM banks to mitigate the high cost of off-chip memory accesses. When accelerating convolutional neural network (CNN) on DRAM-PIM, performance is often constrained by cross-bank (or cross-PIMcore) data transfers, which are induced by the conventional layer-by-layer dataflow that enforces inter-bank (or inter-PIMcore) dependencies across successive CNN layers. To address this challenge, we propose PIMfused, a hardware-software co-design that enables fused-layer dataflow for end-to-end CNN execution in near-bank DRAM-PIM. By adopting fused-layer dataflow, PIMfused improves data reuse and, more importantly, breaks inter-bank data dependencies, thereby optimizing cross-bank data transfers without sacrificing bank-level parallelism. We study the impact of buffer sizes and PIMcore parallelism…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Advanced Memory and Neural Computing
