Rethinking Output Alignment For 1-bit Post-Training Quantization of Large Language Models

Dung Anh Hoang; Cuong Pham; Cuong Nguyen; Trung le; Jianfei Cai; Thanh-Toan Do

arXiv:2512.21651·cs.LG·May 18, 2026

Rethinking Output Alignment For 1-bit Post-Training Quantization of Large Language Models

Dung Anh Hoang, Cuong Pham, Cuong Nguyen, Trung le, Jianfei Cai, Thanh-Toan Do

PDF

TL;DR

This paper introduces a novel post-training quantization method for 1-bit large language models, addressing fundamental issues like error accumulation and anisotropic distortion to improve performance.

Contribution

It proposes a new output-driven PTQ approach that explicitly tackles layer-wise error and representation distortion, outperforming existing methods.

Findings

01

Our method outperforms existing 1-bit PTQ techniques in experiments.

02

Addressing error accumulation and anisotropic distortion is crucial for effective 1-bit quantization.

03

The approach maintains computational efficiency while improving model output fidelity.

Abstract

Large Language Models (LLMs) deliver strong performance across a wide range of NLP tasks, but their massive sizes hinder deployment on resource-constrained devices. To reduce their computational and memory burden, various compression techniques have been proposed, including quantization, pruning, and knowledge distillation. Among these, post-training quantization (PTQ) is widely adopted for its efficiency, as it requires no retraining and only a small dataset for calibration, enabling low-cost deployment. Recent advances for post-training quantization have demonstrated that even near 4-bit methods can maintain most of the original model performance. However, 1-bit quantization remains particularly challenging. A common strategy in 1-bit quantization is to determine binary weights by matching full-precision parameters, following a weight-driven criterion. However, this objective is not…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.