SeVeDo: A Heterogeneous Transformer Accelerator for Low-Bit Inference via Hierarchical Group Quantization and SVD-Guided Mixed Precision
Yuseon Choi, Sangjin Kim, Jungjun Oh, Byeongcheol Kim, and Hoi-Jun Yoo

TL;DR
SeVeDo is an energy-efficient transformer accelerator that combines hierarchical group quantization and SVD-guided mixed precision to reduce energy consumption while maintaining accuracy in low-bit inference.
Contribution
It introduces a novel heterogeneous architecture with hierarchical quantization and SVD-based mixed precision for efficient low-bit transformer inference.
Findings
Achieves 13.8 TOPS/W peak energy efficiency.
Surpasses conventional designs on ViT-Base and Llama2-7B.
Maintains high accuracy with reduced energy consumption.
Abstract
Low-bit quantization is a promising technique for efficient transformer inference by reducing computational and memory overhead. However, aggressive bitwidth reduction remains challenging due to activation outliers, leading to accuracy degradation. Existing methods, such as outlier-handling and group quantization, achieve high accuracy but incur substantial energy consumption. To address this, we propose SeVeDo, an energy-efficient SVD-based heterogeneous accelerator that structurally separates outlier-sensitive components into a high-precision low-rank path, while the remaining computations are executed in a low-bit residual datapath with group quantization. To further enhance efficiency, Hierarchical Group Quantization (HGQ) combines coarse-grained floating-point scaling with fine-grained shifting, effectively reducing dequantization cost. Also, SVD-guided mixed precision (SVD-MP)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Numerical Methods and Algorithms · Sparse and Compressive Sensing Techniques
