Boost Post-Training Quantization via Null Space Optimization for Large Language Models
Jiaqi Zhao, Miao Zhang, Deng Xiang, Ming Li, Weili Guan, Liqiang Nie

TL;DR
This paper introduces a null space-based approach to post-training quantization for large language models, significantly reducing quantization error and improving model compression without extra memory overhead.
Contribution
It proposes a novel null space projection module, Q2N, for LLM quantization, with a theoretical derivation and an efficient approximation method, advancing the state-of-the-art in model compression.
Findings
Effective reduction of quantization error on LLMs
Improved performance on models like LLaMA3, DeepSeek, Qwen3
No additional memory overhead during inference
Abstract
Existing post-training quantization methods for large language models (LLMs) offer remarkable success. However, the increasingly marginal performance gains suggest that existing quantization strategies are insufficient to support the development of more compressed models. To inspire new directions for future research, this paper introduces the concept of null space into LLMs quantization. We argue that the quantization error can be effectively alleviated by constraining the post-quantization weight perturbation to lie within the null space of input activations. To prove this idea, we propose a plug-and-play null space projection module for existing milestone PTQ baselines named Q2N. Specifically, we first design an efficient and accurate null space projection approximation method tailored to the characteristics of LLMs. Subsequently, we theoretically derive a closed-form solution for an…
Peer Reviews
Decision·Submitted to ICLR 2026
(1) Novel Conceptual Framework: The idea of guiding quantization error into the null space is original and reframes the PTQ problem in a more targeted way than simply minimizing numerical error. (2) Practical and Efficient Algorithm: The paper translates theory into a practical algorithm, with the memory-free `α` vector being a key element that makes the method viable for deployment without inference overhead. (3) Comprehensive Validation: The approach is thoroughly evaluated on multiple modern
(1) Marginal Gains vs. Complexity: While gains are consistent, they can be modest. A more direct analysis of the trade-off between the one-time quantization complexity and the magnitude of performance improvement would be beneficial. (2) Hyperparameter Sensitivity: The paper would be strengthened by a more detailed sensitivity analysis for the key hyperparameters `t` and `λ` to better guide practitioners on their selection. (3) Limited Bit-Width Scope: While the paper focuses on 2-3 bit quantiza
1. Introducing null space theory into quantization error analysis is indeed a first in LLM quantization literature (previously it was mostly used in LoRA-Null, AlphaEdit, etc.), and it is quite inspiring. 2. The core lemma (if the error is in the input null space, the output error approaches zero) is logically rigorous and its derivation is complete. 3. Complete derivation, pseudocode, and open-source links are provided, along with engineering execution specifications. 4. Compatibility has been
### **1. Limited contribution and marginal improvements** While the proposed Null Space optimization approach is novel, its methodology is overly simplistic, merely adding an approximate projection operation to existing frameworks such as GPTQ. Experimental results show limited performance improvement—only 1–3 percentage points for most tasks, and no significant improvement for some metrics. ### **2. Over-simplified projection mechanism** The authors claim to reduce error propagation by pro
- The proposed solution has the original element of applying the null-space correction to the quantization error. - The proposed quantization correction can be seamlessly integrated into existing quantized inference pipelines. - It introduces no additional latency or memory overhead since it can be merged into the per-channel scaling vector. - Experimental results demonstrate consistent and significant improvements across various quantization methods, supporting the effectiveness and generality
- The relationship between eigenvalue decomposition and singular value decomposition of the covariance matrix is well established and widely used. Therefore, the “efficient eigenvalue decomposition” component should not be considered a novelty contribution. - It would strengthen the paper to include evaluations on quantization methods that explicitly consider activation statistics, such as AWQ. It is not clear, whether such methods can benefit from the proposed approach. - The WA quantization r
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
