OJBKQ: Objective-Joint Babai-Klein Quantization

Xinyu Wang; Ziyu Zhao; Peng Lu; Yu Gu; Xiao-Wen Chang

arXiv:2602.08376·cs.LG·February 10, 2026

OJBKQ: Objective-Joint Babai-Klein Quantization

Xinyu Wang, Ziyu Zhao, Peng Lu, Yu Gu, Xiao-Wen Chang

PDF

Open Access

TL;DR

OJBKQ introduces a joint optimization approach for post-training quantization of large language models, effectively reducing model size with minimal performance loss at low-bit quantization levels.

Contribution

It formulates weight quantization as a joint optimization problem and applies novel algorithms to find sub-optimal solutions, improving low-bit quantization performance.

Findings

01

Achieves lower perplexity at 3-4 bits compared to existing methods.

02

Maintains comparable computational cost with improved quantization quality.

03

Demonstrates effectiveness on large language models.

Abstract

Post-training quantization (PTQ) is widely used to compress large language models without retraining. However, many existing weight-only methods rely on heuristic objectives and greedy rounding, thus leading to noticeable degradation under low-bit quantization. In this work, we introduce OJBKQ (Objective-Joint Babai-Klein Quantization with K-Best Sampling), a layer-wise PTQ method that formulates weight quantization as a joint optimization problem over activations and weights. This formulation results in a multiple-right-hand-side box-constrained integer least squares (BILS) problem in each layer, which is NP-hard. For each column of the weight matrix, we apply an extended Babai nearest-plane algorithm and an extended version of Klein's randomized Babai algorithm to find the minimum-residual Babai-Klein point, a sub-optimal solution to the BILS problem. Experimental results on large…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Natural Language Processing Techniques · Advanced Data Compression Techniques