Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance

Ao Shen; Qiang Wang; Zhiquan Lai; Xionglve Li; Dongsheng Li

arXiv:2407.17029·cs.LG·July 23, 2025

Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance

Ao Shen, Qiang Wang, Zhiquan Lai, Xionglve Li, Dongsheng Li

PDF

1 Repo

TL;DR

This paper introduces Q-BLoRA and QA-BLoRA, novel methods for fine-tuning and deploying quantized large language models that improve accuracy and efficiency by balancing adapter complexity and trainability.

Contribution

It proposes balanced low-rank adaptation techniques that enhance fine-tuning and low-precision deployment of quantized LLMs, addressing performance degradation issues.

Findings

01

Q-BLoRA achieves state-of-the-art accuracy in fine-tuning quantized LLMs.

02

QA-BLoRA enables effective low-precision inference models.

03

Both methods outperform existing baselines in various scenarios.

Abstract

Large Language Models (LLMs) have demonstrated impressive performance across various domains. However, the enormous number of model parameters makes fine-tuning challenging, significantly limiting their application and deployment. Existing solutions combine parameter quantization with Low-Rank Adaptation (LoRA), reducing memory usage but causing performance degradation. Additionally, converting fine-tuned models to low-precision representations further degrades performance. In this paper, we identify an imbalance in fine-tuning quantized LLMs with LoRA: overly complex adapter inputs and outputs versus low effective trainability of the adapter, leading to underfitting during fine-tuning. Thus, we propose Quantized LLMs fine-tuning with Balanced Low-Rank Adaptation (Q-BLoRA), which simplifies the adapter's inputs and outputs while increasing the adapter's rank to alleviate underfitting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xiaocaigou/qbaraqahira
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAdapter · LLaMA · ALIGN