PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models

Jiaqi Zhao; Miao Zhang; Ming Wang; Yuzhang Shang; Kaihao Zhang; Weili Guan; Yaowei Wang; Min Zhang

arXiv:2502.13179·cs.LG·August 7, 2025

PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models

Jiaqi Zhao, Miao Zhang, Ming Wang, Yuzhang Shang, Kaihao Zhang, Weili Guan, Yaowei Wang, Min Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces PTQ1.61, a novel extremely low-bit post-training quantization method for large language models, achieving 1.61-bit weight quantization with minimal additional overhead and state-of-the-art performance.

Contribution

The paper presents the first 1.61-bit PTQ method, incorporating structured masking, block-wise scaling, and a new quantization preprocessing paradigm to push the limits of low-bit quantization.

Findings

01

Achieves state-of-the-art results in extremely low-bit PTQ.

02

Introduces a structured mask with negligible overhead.

03

Demonstrates the effectiveness of quantization preprocessing.

Abstract

Large Language Models (LLMs) suffer severe performance degradation when facing extremely low-bit (sub 2-bit) quantization. Several existing sub 2-bit post-training quantization (PTQ) methods utilize a mix-precision scheme by leveraging an unstructured fine-grained mask to explicitly distinguish salient weights, while which introduces an extra 1-bit or more per weight. To explore the real limit of PTQ, we propose an extremely low-bit PTQ method called PTQ1.61, which enables weight quantization to 1.61-bit for the first time. Specifically, we first introduce a one-dimensional structured mask with negligibly additional 0.0002-bit per weight based on input activations from the perspective of reducing the upper bound of quantization error to allocate corresponding salient weight channels to 4-bit. For non-salient channels binarization, an efficient block-wise scaling factors optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zjq0455/ptq1.61
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques