LoRA: Low-Rank Adaptation of Large Language Models

Edward J. Hu; Yelong Shen; Phillip Wallis; Zeyuan Allen-Zhu; Yuanzhi; Li; Shean Wang; Lu Wang; Weizhu Chen

arXiv:2106.09685·cs.CL·October 19, 2021·2.4k cites

LoRA: Low-Rank Adaptation of Large Language Models

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi, Li, Shean Wang, Lu Wang, Weizhu Chen

PDF

Open Access 5 Repos 10 Models 5 Datasets 3 Videos

TL;DR

LoRA introduces a low-rank adaptation method that significantly reduces the number of trainable parameters in large language models, enabling efficient fine-tuning without sacrificing performance.

Contribution

The paper proposes LoRA, a novel low-rank adaptation technique that drastically decreases trainable parameters and memory usage during fine-tuning of large language models.

Findings

01

LoRA reduces trainable parameters by 10,000 times compared to full fine-tuning.

02

LoRA achieves comparable or better performance than traditional fine-tuning.

03

LoRA has no additional inference latency and improves training throughput.

Abstract

An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times. LoRA performs on-par or better than fine-tuning in model quality on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

What is LoRA? Low-Rank Adaptation for finetuning LLMs EXPLAINED· youtube

The ARC Prize 2024 Winning Algorithm [Daniel Franzen and Jan Disselhoff]· youtube

LoRA: Low-Rank Adaptation of Large Language Models· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Linear Warmup With Linear Decay · WordPiece · BERT · RoBERTa · How do I file a dispute with Expedia?*DisputeFastService · DeBERTa · Absolute Position Encodings