Hybrid SLC-MLC RRAM Mixed-Signal Processing-in-Memory Architecture for Transformer Acceleration via Gradient Redistribution

Chang Eun Song; Priyansh Bhatnagar; Zihan Xia; Nam Sung Kim; Tajana Rosing; Mingu Kang

arXiv:2506.00020·cs.AR·June 3, 2025

Hybrid SLC-MLC RRAM Mixed-Signal Processing-in-Memory Architecture for Transformer Acceleration via Gradient Redistribution

Chang Eun Song, Priyansh Bhatnagar, Zihan Xia, Nam Sung Kim, Tajana Rosing, Mingu Kang

PDF

TL;DR

This paper introduces HyFlexPIM, a hybrid RRAM-based processing-in-memory architecture for transformer acceleration that combines SLC and MLC technologies with a gradient redistribution algorithm to optimize accuracy and efficiency.

Contribution

The paper presents a novel hybrid SLC-MLC RRAM PIM architecture with a gradient redistribution algorithm for optimized transformer inference acceleration.

Findings

01

Achieves up to 1.86X higher throughput.

02

Achieves up to 1.45X higher energy efficiency.

03

Effectively balances accuracy and efficiency using hybrid RRAM and gradient redistribution.

Abstract

Transformers, while revolutionary, face challenges due to their demanding computational cost and large data movement. To address this, we propose HyFlexPIM, a novel mixed-signal processing-in-memory (PIM) accelerator for inference that flexibly utilizes both single-level cell (SLC) and multi-level cell (MLC) RRAM technologies to trade-off accuracy and efficiency. HyFlexPIM achieves efficient dual-mode operation by utilizing digital PIM for high-precision and write-intensive operations while analog PIM for high parallel and low-precision computations. The analog PIM further distributes tasks between SLC and MLC PIM operations, where a single analog PIM module can be reconfigured to switch between two operations (SLC/MLC) with minimal overhead (<1% for area & energy). Critical weights are allocated to SLC RRAM for high accuracy, while less critical weights are assigned to MLC RRAM to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.