Accurate LoRA-Finetuning Quantization of LLMs via Information Retention

Haotong Qin; Xudong Ma; Xingyu Zheng; Xiaoyang Li; Yang Zhang; Shouda; Liu; Jie Luo; Xianglong Liu; Michele Magno

arXiv:2402.05445·cs.LG·May 28, 2024·42 cites

Accurate LoRA-Finetuning Quantization of LLMs via Information Retention

Haotong Qin, Xudong Ma, Xingyu Zheng, Xiaoyang Li, Yang Zhang, Shouda, Liu, Jie Luo, Xianglong Liu, Michele Magno

PDF

Open Access 1 Repo

TL;DR

This paper introduces IR-QLoRA, a novel method for quantizing LLMs with LoRA that retains information effectively, significantly improving accuracy with minimal additional computational cost across various models and frameworks.

Contribution

IR-QLoRA combines information calibration quantization and elastic connection techniques to enhance quantized LLM accuracy while maintaining efficiency and versatility.

Findings

01

Achieves 1.4% accuracy improvement on LLaMA-7B for MMLU at 4-bit quantization.

02

Significantly improves quantized LLM performance with only 0.31% extra time.

03

Compatible with multiple quantization frameworks and models.

Abstract

The LoRA-finetuning quantization of LLMs has been extensively studied to obtain accurate yet compact LLMs for deployment on resource-constrained hardware. However, existing methods cause the quantized LLM to severely degrade and even fail to benefit from the finetuning of LoRA. This paper proposes a novel IR-QLoRA for pushing quantized LLMs with LoRA to be highly accurate through information retention. The proposed IR-QLoRA mainly relies on two technologies derived from the perspective of unified information: (1) statistics-based Information Calibration Quantization allows the quantized parameters of LLM to retain original information accurately; (2) finetuning-based Information Elastic Connection makes LoRA utilizes elastic representation transformation with diverse information. Comprehensive experiments show that IR-QLoRA can significantly improve accuracy across LLaMA and LLaMA2…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

htqin/ir-qlora
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging