Generalized Ping-Pong: Off-Chip Memory Bandwidth Centric Pipelining Strategy for Processing-In-Memory Accelerators
Ruibao Wang, Bonan Yan

TL;DR
This paper introduces a generalized ping-pong pipelining strategy for processing-in-memory accelerators that maximizes off-chip memory bandwidth utilization, significantly improving performance for large deep neural network models.
Contribution
It proposes a novel off-chip memory bandwidth centric pipelining strategy called generalized ping-pong, enhancing PIM efficiency for large DNNs beyond conventional methods.
Findings
Achieves over 1.67x acceleration with full bandwidth utilization.
Provides 1.22x to 7.71x acceleration under bandwidth constraints.
Demonstrates improved PIM performance through quantitative analysis.
Abstract
Processing-in-memory (PIM) is a promising choice for accelerating deep neural networks (DNNs) featuring high efficiency and low power. However, the rapid upscaling of neural network model sizes poses a crucial challenge for the limited on-chip PIM capacity. When the PIM presumption of "pre-loading DNN weights/parameters only once before repetitive computing" is no longer practical, concurrent writing and computing techniques become necessary for PIM. Conventional methods of naive ping-pong or in~situ concurrent write/compute scheduling for PIM cause low utilization of off-chip memory bandwidth, subsequently offsetting the efficiency gain brought by PIM technology. To address this challenge, we propose an off-chip memory bandwidth centric pipelining strategy, named "generalized ping-pong", to maximize the utilization and performance of PIM accelerators toward large DNN models. The core…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Distributed and Parallel Computing Systems
