ROSA: Random Subspace Adaptation for Efficient Fine-Tuning

Marawan Gamal Abdel Hameed; Aristides Milios; Siva Reddy; Guillaume; Rabusseau

arXiv:2407.07802·cs.LG·July 11, 2024

ROSA: Random Subspace Adaptation for Efficient Fine-Tuning

Marawan Gamal Abdel Hameed, Aristides Milios, Siva Reddy, Guillaume, Rabusseau

PDF

Open Access 1 Repo

TL;DR

ROSA is a novel parameter-efficient fine-tuning method that adapts large models by selecting subspaces, outperforming previous methods like LoRA without adding inference latency, especially effective in NLP tasks.

Contribution

ROSA introduces a flexible subspace adaptation approach that surpasses LoRA in performance and expressiveness while maintaining zero inference overhead.

Findings

01

ROSA outperforms LoRA on almost all GLUE tasks.

02

ROSA achieves better results on NLP generation tasks.

03

ROSA is more expressive than LoRA without extra memory during inference.

Abstract

Model training requires significantly more memory, compared with inference. Parameter efficient fine-tuning (PEFT) methods provide a means of adapting large models to downstream tasks using less memory. However, existing methods such as adapters, prompt tuning or low-rank adaptation (LoRA) either introduce latency overhead at inference time or achieve subpar downstream performance compared with full fine-tuning. In this work we propose Random Subspace Adaptation (ROSA), a method that outperforms previous PEFT methods by a significant margin, while maintaining a zero latency overhead during inference time. In contrast to previous methods, ROSA is able to adapt subspaces of arbitrarily large dimension, better approximating full-finetuning. We demonstrate both theoretically and experimentally that this makes ROSA strictly more expressive than LoRA, without consuming additional memory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rosa-paper/rosa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Data Compression Techniques · Telecommunications and Broadcasting Technologies

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Layer · Linear Warmup With Linear Decay · Weight Decay · Multi-Head Attention · Softmax · WordPiece · Linear Warmup With Cosine Annealing