PIM-GPT: A Hybrid Process-in-Memory Accelerator for Autoregressive   Transformers

Yuting Wu; Ziyu Wang; Wei D. Lu

arXiv:2310.09385·cs.AR·April 16, 2024·1 cites

PIM-GPT: A Hybrid Process-in-Memory Accelerator for Autoregressive Transformers

Yuting Wu, Ziyu Wang, Wei D. Lu

PDF

Open Access

TL;DR

PIM-GPT is a novel DRAM-based process-in-memory accelerator that significantly speeds up and improves energy efficiency for GPT inference by executing matrix operations directly within memory chips.

Contribution

This work introduces PIM-GPT, the first DRAM-based PIM architecture tailored for GPT inference, combining hardware design and software mapping for end-to-end acceleration.

Findings

01

Achieves up to 137x speedup over GPU

02

Provides up to 602x energy efficiency improvement

03

Supports multiple GPT models with up to 1.4 billion parameters

Abstract

Decoder-only Transformer models such as GPT have demonstrated exceptional performance in text generation, by autoregressively predicting the next token. However, the efficacy of running GPT on current hardware systems is bounded by low compute-to-memory-ratio and high memory access. Process-in-memory (PIM) architectures can minimize off-chip data movement and utilize high internal bandwidth. They stand out as promising candidates for accelerating memory-bounded tasks such as GPT inference. In this work, we propose a PIM accelerator, PIM-GPT, which achieves end-to-end acceleration of GPT inference with high performance and high energy efficiency. PIM-GPT leverages DRAM-based PIM designs for executing multiply-accumulate (MAC) operations directly in the DRAM chips, eliminating the need to move matrix data off-chip. Non-linear functions and data communication is supported by an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvancements in Semiconductor Devices and Circuit Design · Semiconductor materials and devices · Advanced Data Storage Technologies