RILQ: Rank-Insensitive LoRA-based Quantization Error Compensation for   Boosting 2-bit Large Language Model Accuracy

Geonho Lee; Janghwan Lee; Sukjin Hong; Minsoo Kim; Euijai Ahn,; Du-Seong Chang; Jungwook Choi

arXiv:2412.01129·cs.LG·March 31, 2025

RILQ: Rank-Insensitive LoRA-based Quantization Error Compensation for Boosting 2-bit Large Language Model Accuracy

Geonho Lee, Janghwan Lee, Sukjin Hong, Minsoo Kim, Euijai Ahn,, Du-Seong Chang, Jungwook Choi

PDF

Open Access 1 Repo 1 Video

TL;DR

RILQ introduces a rank-insensitive method for quantization error compensation, significantly improving 2-bit LLM accuracy while maintaining efficiency, addressing limitations of previous approaches in ultra-low-bit scenarios.

Contribution

The paper proposes RILQ, a novel rank-insensitive error compensation technique that enhances 2-bit LLM performance, filling a gap in quantization error correction for ultra-low-bit models.

Findings

01

Consistent accuracy improvements on LLaMA-2 and LLaMA-3.

02

Effective across various state-of-the-art quantizers.

03

Maintains computational efficiency comparable to existing methods.

Abstract

Low-rank adaptation (LoRA) has become the dominant method for parameter-efficient LLM fine-tuning, with LoRA-based quantization error compensation (LQEC) emerging as a powerful tool for recovering accuracy in compressed LLMs. However, LQEC has underperformed in sub-4-bit scenarios, with no prior investigation into understanding this limitation. We propose RILQ (Rank-Insensitive LoRA-based Quantization Error Compensation) to understand fundamental limitation and boost 2-bit LLM accuracy. Based on rank analysis revealing model-wise activation discrepancy loss's rank-insensitive nature, RILQ employs this loss to adjust adapters cooperatively across layers, enabling robust error compensation with low-rank adapters. Evaluations on LLaMA-2 and LLaMA-3 demonstrate RILQ's consistent improvements in 2-bit quantized inference across various state-of-the-art quantizers and enhanced accuracy in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aiha-lab/rilq
pytorchOfficial

Videos

RILQ: Rank-Insensitive LoRA-Based Quantization Error Compensation for Boosting 2-Bit Large Language Model Accuracy· underline

Taxonomy

TopicsSpeech Recognition and Synthesis · Brain Tumor Detection and Classification