R2Q: Towards Robust 2-Bit Large Language Models via Residual Refinement Quantization

Jiayi Chen; Jieqi Shi; Jing Huo; Chen Wu

arXiv:2511.21736·cs.CL·December 1, 2025

R2Q: Towards Robust 2-Bit Large Language Models via Residual Refinement Quantization

Jiayi Chen, Jieqi Shi, Jing Huo, Chen Wu

PDF

Open Access

TL;DR

This paper introduces R2Q, a novel 2-bit quantization method for large language models that decomposes the process into two 1-bit steps, significantly improving accuracy and stability over existing methods.

Contribution

R2Q presents a residual refinement quantization framework that enhances 2-bit LLM quantization by using a two-stage 1-bit decomposition, improving performance and training stability.

Findings

01

R2Q outperforms existing 2-bit quantization methods across multiple benchmarks.

02

It improves training stability and accelerates convergence under extreme compression.

03

The modular design allows easy integration with existing quantization frameworks.

Abstract

The rapid progress of Large Language Models (LLMs) has brought substantial computational and memory demands, spurring the adoption of low-bit quantization. While 8-bit and 4-bit formats have become prevalent, extending quantization to 2 bits remains challenging due to severe accuracy degradation. To address this, we propose Residual Refinement Quantization (R2Q)-a novel 2-bit quantization framework that decomposes the process into two sequential 1-bit sub-quantizations, forming an adaptive quantization lattice. Extensive evaluations on Llama, OPT, and Qwen across diverse benchmarks-covering question answering, commonsense reasoning, and language modeling-demonstrate that R2Q consistently outperforms existing 2-bit quantization methods in both fine-grained and coarse-grained settings. By refining quantization through a residual learning mechanism, R2Q enhances performance, improves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Topic Modeling · Natural Language Processing Techniques