Boost Post-Training Quantization via Null Space Optimization for Large Language Models

Jiaqi Zhao; Miao Zhang; Deng Xiang; Ming Li; Weili Guan; Liqiang Nie

arXiv:2506.11044·cs.LG·October 28, 2025

Boost Post-Training Quantization via Null Space Optimization for Large Language Models

Jiaqi Zhao, Miao Zhang, Deng Xiang, Ming Li, Weili Guan, Liqiang Nie

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a null space-based approach to post-training quantization for large language models, significantly reducing quantization error and improving model compression without extra memory overhead.

Contribution

It proposes a novel null space projection module, Q2N, for LLM quantization, with a theoretical derivation and an efficient approximation method, advancing the state-of-the-art in model compression.

Findings

01

Effective reduction of quantization error on LLMs

02

Improved performance on models like LLaMA3, DeepSeek, Qwen3

03

No additional memory overhead during inference

Abstract

Existing post-training quantization methods for large language models (LLMs) offer remarkable success. However, the increasingly marginal performance gains suggest that existing quantization strategies are insufficient to support the development of more compressed models. To inspire new directions for future research, this paper introduces the concept of null space into LLMs quantization. We argue that the quantization error can be effectively alleviated by constraining the post-quantization weight perturbation to lie within the null space of input activations. To prove this idea, we propose a plug-and-play null space projection module for existing milestone PTQ baselines named Q2N. Specifically, we first design an efficient and accurate null space projection approximation method tailored to the characteristics of LLMs. Subsequently, we theoretically derive a closed-form solution for an…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 4

Strengths

(1) Novel Conceptual Framework: The idea of guiding quantization error into the null space is original and reframes the PTQ problem in a more targeted way than simply minimizing numerical error. (2) Practical and Efficient Algorithm: The paper translates theory into a practical algorithm, with the memory-free `α` vector being a key element that makes the method viable for deployment without inference overhead. (3) Comprehensive Validation: The approach is thoroughly evaluated on multiple modern

Weaknesses

(1) Marginal Gains vs. Complexity: While gains are consistent, they can be modest. A more direct analysis of the trade-off between the one-time quantization complexity and the magnitude of performance improvement would be beneficial. (2) Hyperparameter Sensitivity: The paper would be strengthened by a more detailed sensitivity analysis for the key hyperparameters `t` and `λ` to better guide practitioners on their selection. (3) Limited Bit-Width Scope: While the paper focuses on 2-3 bit quantiza

Reviewer 02Rating 4Confidence 4

Strengths

1. Introducing null space theory into quantization error analysis is indeed a first in LLM quantization literature (previously it was mostly used in LoRA-Null, AlphaEdit, etc.), and it is quite inspiring. 2. The core lemma (if the error is in the input null space, the output error approaches zero) is logically rigorous and its derivation is complete. 3. Complete derivation, pseudocode, and open-source links are provided, along with engineering execution specifications. 4. Compatibility has been

Weaknesses

### **1. Limited contribution and marginal improvements** While the proposed Null Space optimization approach is novel, its methodology is overly simplistic, merely adding an approximate projection operation to existing frameworks such as GPTQ. Experimental results show limited performance improvement—only 1–3 percentage points for most tasks, and no significant improvement for some metrics. ### **2. Over-simplified projection mechanism** The authors claim to reduce error propagation by pro

Reviewer 03Rating 6Confidence 5

Strengths

- The proposed solution has the original element of applying the null-space correction to the quantization error. - The proposed quantization correction can be seamlessly integrated into existing quantized inference pipelines. - It introduces no additional latency or memory overhead since it can be merged into the per-channel scaling vector. - Experimental results demonstrate consistent and significant improvements across various quantization methods, supporting the effectiveness and generality

Weaknesses

- The relationship between eigenvalue decomposition and singular value decomposition of the covariance matrix is well established and widely used. Therefore, the “efficient eigenvalue decomposition” component should not be considered a novelty contribution. - It would strengthen the paper to include evaluations on quantization methods that explicitly consider activation statistics, such as AWQ. It is not clear, whether such methods can benefit from the proposed approach. - The WA quantization r

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis