FLARE: FP-Less PTQ and Low-ENOB ADC Based AMS-PiM for Error-Resilient, Fast, and Efficient Transformer Acceleration
Donghyeon Yi, Seoyoung Lee, Jongho Kim, Junyoung Kim, Sohmyung Ha, Ik, Joon Chang, and Minkyu Je

TL;DR
This paper introduces RAP, an innovative AMS-PiM architecture that eliminates the need for high-ENOB ADCs and complex quantization processes, significantly enhancing the efficiency, speed, and error resilience of transformer acceleration.
Contribution
RAP is the first AMS-PiM design to remove DQ-Q processes, utilize low-ENOB ADCs, and incorporate nonlinear processing, enabling more efficient and scalable transformer deployment.
Findings
RAP achieves higher energy efficiency than GPUs and traditional PiM architectures.
RAP reduces latency and improves accuracy in transformer processing.
The architecture demonstrates robustness against PVT variations and hardware inefficiencies.
Abstract
Encoder-based transformers, powered by self-attention layers, have revolutionized machine learning with their context-aware representations. However, their quadratic growth in computational and memory demands presents significant bottlenecks. Analog-Mixed-Signal Process-in-Memory (AMS-PiM) architectures address these challenges by enabling efficient on-chip processing. Traditionally, AMS-PiM relies on Quantization-Aware Training (QAT), which is hardware-efficient but requires extensive retraining to adapt models to AMS-PiMs, making it increasingly impractical for transformer models. Post-Training Quantization (PTQ) mitigates this training overhead but introduces significant hardware inefficiencies. PTQ relies on dequantization-quantization (DQ-Q) processes, floating-point units (FPUs), and high-ENOB (Effective Number of Bits) analog-to-digital converters (ADCs). Particularly, High-ENOB…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntegrated Circuits and Semiconductor Failure Analysis · Power Transformer Diagnostics and Insulation · Magneto-Optical Properties and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
