SignRoundV2: Toward Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs

Wenhua Cheng; Weiwei Zhang; Heng Guo; Haihao Shen; Zaner Ma

arXiv:2512.04746·cs.CL·May 19, 2026

SignRoundV2: Toward Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs

Wenhua Cheng, Weiwei Zhang, Heng Guo, Haihao Shen, Zaner Ma

PDF

1 Repo 2 Models

TL;DR

SignRoundV2 is a post-training quantization framework that significantly reduces performance loss in extremely low-bit LLMs by adaptive mixed-precision strategies and stabilization techniques.

Contribution

It introduces a novel adaptive mixed-precision approach and lightweight stabilization methods to improve low-bit quantization of LLMs.

Findings

01

Achieves near-lossless performance in mixed MXFP settings.

02

Narrows the performance gap to approximately 1% at 4.5 bits.

03

Improves accuracy in 2-bit weight-only quantization.

Abstract

Extremely low-bit quantization is critical for efficiently deploying Large Language Models (LLMs), yet it often leads to severe performance degradation at 2 bits and even at 4 bits (e.g., MXFP4). We present SignRoundV2, a post-training quantization framework designed to maintain high performance even under aggressive compression. SignRoundV2 introduces (1) a simple yet efficient adaptive mixed-precision strategy that leverages gradient information and quantization-induced reconstruction errors to guide layer-wise bit allocation, and (2) a set of lightweight stabilization techniques, including loss filtering and a pre-tuning scale search, to improve tuning effectiveness in extremely low-bit regimes. Our approach takes a significant step toward closing the performance gap between quantized and full-precision models. Experimental results across diverse LLMs demonstrate that SignRoundV2…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

intel/auto-round
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Data Compression Techniques · Adversarial Robustness in Machine Learning