Chain of LoRA: Efficient Fine-tuning of Language Models via Residual   Learning

Wenhan Xia; Chengwei Qin; Elad Hazan

arXiv:2401.04151·cs.LG·January 10, 2024·5 cites

Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning

Wenhan Xia, Chengwei Qin, Elad Hazan

PDF

Open Access

TL;DR

This paper introduces Chain of LoRA (COLA), an iterative fine-tuning framework that enhances LoRA's performance by residual learning, achieving better results on language models without extra computational costs.

Contribution

COLA is a novel iterative optimization method that improves LoRA's generalization by residual merging, bridging the gap to full fine-tuning without additional resource overhead.

Findings

01

COLA outperforms standard LoRA across multiple models and tasks.

02

Theoretical convergence guarantees support COLA's effectiveness.

03

Empirical results show improved accuracy without extra computational costs.

Abstract

Fine-tuning is the primary methodology for tailoring pre-trained large language models to specific tasks. As the model's scale and the diversity of tasks expand, parameter-efficient fine-tuning methods are of paramount importance. One of the most widely used family of methods is low-rank adaptation (LoRA) and its variants. LoRA encodes weight update as the product of two low-rank matrices. Despite its advantages, LoRA falls short of full-parameter fine-tuning in terms of generalization error for certain tasks. We introduce Chain of LoRA (COLA), an iterative optimization framework inspired by the Frank-Wolfe algorithm, to bridge the gap between LoRA and full parameter fine-tuning, without incurring additional computational costs or memory overheads. COLA employs a residual learning procedure where it merges learned LoRA modules into the pre-trained language model parameters and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM · Advanced Neural Network Applications · Stochastic Gradient Optimization Techniques

MethodsCOLA