QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Yuhui Xu; Lingxi Xie; Xiaotao Gu; Xin Chen; Heng Chang; Hengheng; Zhang; Zhengsu Chen; Xiaopeng Zhang; Qi Tian

arXiv:2309.14717·cs.LG·October 10, 2023·21 cites

QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Yuhui Xu, Lingxi Xie, Xiaotao Gu, Xin Chen, Heng Chang, Hengheng, Zhang, Zhengsu Chen, Xiaopeng Zhang, Qi Tian

PDF

Open Access 2 Repos 3 Reviews

TL;DR

QA-LoRA introduces a quantization-aware low-rank adaptation method that enables efficient fine-tuning of large language models with reduced computational resources while maintaining accuracy.

Contribution

It presents a novel algorithm combining quantization and low-rank adaptation, improving efficiency and ease of implementation for large language model fine-tuning.

Findings

01

Effective in reducing memory and time during fine-tuning

02

Maintains accuracy after quantization and adaptation

03

Applicable to LLaMA and LLaMA2 models across various tasks

Abstract

Recently years have witnessed a rapid development of large language models (LLMs). Despite the strong ability in many language-understanding tasks, the heavy computational burden largely restricts the application of LLMs especially when one needs to deploy them onto edge devices. In this paper, we propose a quantization-aware low-rank adaptation (QA-LoRA) algorithm. The motivation lies in the imbalanced degrees of freedom of quantization and adaptation, and the solution is to use group-wise operators which increase the degree of freedom of quantization meanwhile decreasing that of adaptation. QA-LoRA is easily implemented with a few lines of code, and it equips the original LoRA with two-fold abilities: (i) during fine-tuning, the LLM's weights are quantized (e.g., into INT4) to reduce time and memory usage; (ii) after fine-tuning, the LLM and auxiliary weights are naturally integrated…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

**Addresses a Significant Issue** - QLoRA's potential is realized through its ability to quantize LoRA weights, effectively resolving the disparities observed between fine-tuning and inference in QLoRA. **Streamlined Implementation** - The authors highlight the method's simplicity, emphasizing that it necessitates a mere two lines of code modification to yield impressive enhancements. **Thorough Assessment** - The evaluation is meticulous, with the authors examining a spectrum of competitive m

Weaknesses

Reasoning behind the method - Wy should all the c_ij as defined in the paper be equal is not clear — which is the main motivation for the group-wise quantisation. I would be willing to improve the scores with better explanation on the explanation of the method (See the questions)

Reviewer 02Rating 8· accept, good paperConfidence 4

Strengths

- This work solves a limitation of previous parameter-efficient tuning of LLMs by eliminating the need for a separate post-training quantization which drops model accuracy - QA-LoRA further enhances memory efficiency of SOTA while preserving accuracy - The experiments are convincing as they cover a wide range of scenarios

Weaknesses

QA-LoRA introduce a hyper-parameter (L: Group size). This requires additional optimization and It is unclear if it can be selected without tuning.

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

* The paper organization, presentation, and references are good. * The proposed method has enough novelty.

Weaknesses

* Parameter offset in experiments: The proposed method incorporates group-wise/sub-channel qunatization, which includes an additional number of parameters for scales. Also, the proposed QA-LoRA reduces the size of low-rank matrices. However, these parameter offsets are not reflected in the results, which could be misleading to the audiences. It would be more informative to add the actual model size (or estimated) in MB/GB for each of the models. * In the ablation study, only group size is examin

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis