R2Q: Towards Robust 2-Bit Large Language Models via Residual Refinement Quantization
Jiayi Chen, Jieqi Shi, Jing Huo, Chen Wu

TL;DR
This paper introduces R2Q, a novel 2-bit quantization method for large language models that decomposes the process into two 1-bit steps, significantly improving accuracy and stability over existing methods.
Contribution
R2Q presents a residual refinement quantization framework that enhances 2-bit LLM quantization by using a two-stage 1-bit decomposition, improving performance and training stability.
Findings
R2Q outperforms existing 2-bit quantization methods across multiple benchmarks.
It improves training stability and accelerates convergence under extreme compression.
The modular design allows easy integration with existing quantization frameworks.
Abstract
The rapid progress of Large Language Models (LLMs) has brought substantial computational and memory demands, spurring the adoption of low-bit quantization. While 8-bit and 4-bit formats have become prevalent, extending quantization to 2 bits remains challenging due to severe accuracy degradation. To address this, we propose Residual Refinement Quantization (R2Q)-a novel 2-bit quantization framework that decomposes the process into two sequential 1-bit sub-quantizations, forming an adaptive quantization lattice. Extensive evaluations on Llama, OPT, and Qwen across diverse benchmarks-covering question answering, commonsense reasoning, and language modeling-demonstrate that R2Q consistently outperforms existing 2-bit quantization methods in both fine-grained and coarse-grained settings. By refining quantization through a residual learning mechanism, R2Q enhances performance, improves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Topic Modeling · Natural Language Processing Techniques
