pQuant: Towards Effective Low-Bit Language Models via Decoupled Linear Quantization-Aware Training

Wenzheng Zhang; Bingzheng Liu; Yang Hu; Xiaoying Bai; Wentao Zhang; Bin Cui

arXiv:2602.22592·cs.LG·February 27, 2026

pQuant: Towards Effective Low-Bit Language Models via Decoupled Linear Quantization-Aware Training

Wenzheng Zhang, Bingzheng Liu, Yang Hu, Xiaoying Bai, Wentao Zhang, Bin Cui

PDF

Open Access

TL;DR

pQuant introduces a decoupled linear quantization-aware training method that improves low-bit language model accuracy by separating sensitive parameters into a high-precision branch, enabling scalable and efficient quantization.

Contribution

The paper proposes pQuant, a novel approach that decouples parameters into branches to enhance low-bit quantization of large language models, addressing sensitivity homogenization issues.

Findings

01

Achieves state-of-the-art results in extremely low-bit quantization.

02

Effectively preserves sensitive parameters with a high-precision branch.

03

Enables scalable and efficient low-bit LLM deployment.

Abstract

Quantization-Aware Training from scratch has emerged as a promising approach for building efficient large language models (LLMs) with extremely low-bit weights (sub 2-bit), which can offer substantial advantages for edge deployment. However, existing methods still fail to achieve satisfactory accuracy and scalability. In this work, we identify a parameter democratization effect as a key bottleneck: the sensitivity of all parameters becomes homogenized, severely limiting expressivity. To address this, we propose pQuant, a method that decouples parameters by splitting linear layers into two specialized branches: a dominant 1-bit branch for efficient computation and a compact high-precision branch dedicated to preserving the most sensitive parameters. Through tailored feature scaling, we explicitly guide the model to allocate sensitive parameters to the high-precision branch. Furthermore,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling