Generalized Ping-Pong: Off-Chip Memory Bandwidth Centric Pipelining   Strategy for Processing-In-Memory Accelerators

Ruibao Wang; Bonan Yan

arXiv:2411.13054·cs.AR·November 21, 2024

Generalized Ping-Pong: Off-Chip Memory Bandwidth Centric Pipelining Strategy for Processing-In-Memory Accelerators

Ruibao Wang, Bonan Yan

PDF

Open Access 1 Repo

TL;DR

This paper introduces a generalized ping-pong pipelining strategy for processing-in-memory accelerators that maximizes off-chip memory bandwidth utilization, significantly improving performance for large deep neural network models.

Contribution

It proposes a novel off-chip memory bandwidth centric pipelining strategy called generalized ping-pong, enhancing PIM efficiency for large DNNs beyond conventional methods.

Findings

01

Achieves over 1.67x acceleration with full bandwidth utilization.

02

Provides 1.22x to 7.71x acceleration under bandwidth constraints.

03

Demonstrates improved PIM performance through quantitative analysis.

Abstract

Processing-in-memory (PIM) is a promising choice for accelerating deep neural networks (DNNs) featuring high efficiency and low power. However, the rapid upscaling of neural network model sizes poses a crucial challenge for the limited on-chip PIM capacity. When the PIM presumption of "pre-loading DNN weights/parameters only once before repetitive computing" is no longer practical, concurrent writing and computing techniques become necessary for PIM. Conventional methods of naive ping-pong or in~situ concurrent write/compute scheduling for PIM cause low utilization of off-chip memory bandwidth, subsequently offsetting the efficiency gain brought by PIM technology. To address this challenge, we propose an off-chip memory bandwidth centric pipelining strategy, named "generalized ping-pong", to maximize the utilization and performance of PIM accelerators toward large DNN models. The core…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rw999creator/gpp-pim
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Distributed and Parallel Computing Systems