Hybrid SLC-MLC RRAM Mixed-Signal Processing-in-Memory Architecture for Transformer Acceleration via Gradient Redistribution
Chang Eun Song, Priyansh Bhatnagar, Zihan Xia, Nam Sung Kim, Tajana Rosing, Mingu Kang

TL;DR
This paper introduces HyFlexPIM, a hybrid RRAM-based processing-in-memory architecture for transformer acceleration that combines SLC and MLC technologies with a gradient redistribution algorithm to optimize accuracy and efficiency.
Contribution
The paper presents a novel hybrid SLC-MLC RRAM PIM architecture with a gradient redistribution algorithm for optimized transformer inference acceleration.
Findings
Achieves up to 1.86X higher throughput.
Achieves up to 1.45X higher energy efficiency.
Effectively balances accuracy and efficiency using hybrid RRAM and gradient redistribution.
Abstract
Transformers, while revolutionary, face challenges due to their demanding computational cost and large data movement. To address this, we propose HyFlexPIM, a novel mixed-signal processing-in-memory (PIM) accelerator for inference that flexibly utilizes both single-level cell (SLC) and multi-level cell (MLC) RRAM technologies to trade-off accuracy and efficiency. HyFlexPIM achieves efficient dual-mode operation by utilizing digital PIM for high-precision and write-intensive operations while analog PIM for high parallel and low-precision computations. The analog PIM further distributes tasks between SLC and MLC PIM operations, where a single analog PIM module can be reconfigured to switch between two operations (SLC/MLC) with minimal overhead (<1% for area & energy). Critical weights are allocated to SLC RRAM for high accuracy, while less critical weights are assigned to MLC RRAM to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
