SeVeDo: A Heterogeneous Transformer Accelerator for Low-Bit Inference via Hierarchical Group Quantization and SVD-Guided Mixed Precision

Yuseon Choi; Sangjin Kim; Jungjun Oh; Byeongcheol Kim; and Hoi-Jun Yoo

arXiv:2512.12930·cs.LG·January 23, 2026

SeVeDo: A Heterogeneous Transformer Accelerator for Low-Bit Inference via Hierarchical Group Quantization and SVD-Guided Mixed Precision

Yuseon Choi, Sangjin Kim, Jungjun Oh, Byeongcheol Kim, and Hoi-Jun Yoo

PDF

Open Access

TL;DR

SeVeDo is an energy-efficient transformer accelerator that combines hierarchical group quantization and SVD-guided mixed precision to reduce energy consumption while maintaining accuracy in low-bit inference.

Contribution

It introduces a novel heterogeneous architecture with hierarchical quantization and SVD-based mixed precision for efficient low-bit transformer inference.

Findings

01

Achieves 13.8 TOPS/W peak energy efficiency.

02

Surpasses conventional designs on ViT-Base and Llama2-7B.

03

Maintains high accuracy with reduced energy consumption.

Abstract

Low-bit quantization is a promising technique for efficient transformer inference by reducing computational and memory overhead. However, aggressive bitwidth reduction remains challenging due to activation outliers, leading to accuracy degradation. Existing methods, such as outlier-handling and group quantization, achieve high accuracy but incur substantial energy consumption. To address this, we propose SeVeDo, an energy-efficient SVD-based heterogeneous accelerator that structurally separates outlier-sensitive components into a high-precision low-rank path, while the remaining computations are executed in a low-bit residual datapath with group quantization. To further enhance efficiency, Hierarchical Group Quantization (HGQ) combines coarse-grained floating-point scaling with fine-grained shifting, effectively reducing dequantization cost. Also, SVD-guided mixed precision (SVD-MP)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Numerical Methods and Algorithms · Sparse and Compressive Sensing Techniques