R-LoRA: Randomized Multi-Head LoRA for Efficient Multi-Task Learning
Jinda Liu, Yi Chang, Yuan Wu

TL;DR
R-LoRA introduces a randomized multi-head approach to enhance multi-task learning in large language models, achieving better performance, reduced memory usage, and faster training compared to traditional LoRA methods.
Contribution
This paper proposes R-LoRA, a novel multi-head randomization technique that improves multi-task learning efficiency and effectiveness in large language models.
Findings
R-LoRA outperforms standard LoRA in multi-task benchmarks.
It reduces GPU memory consumption and training time.
Increased head diversity improves task-specific feature learning.
Abstract
Fine-tuning large language models (LLMs) is computationally expensive, and Low-Rank Adaptation (LoRA) provides a cost-effective solution by approximating weight updates through low-rank matrices. In real-world scenarios, LLMs are fine-tuned on data from multiple domains to perform tasks across various fields, embodying multi-task learning (MTL). LoRA often underperforms in such complex scenarios. To enhance LoRA's capability in multi-task learning, we propose R-LoRA, which incorporates Multi-Head Randomization. Multi-Head Randomization diversifies the head matrices through Multi-Head Dropout and Multi-Head Random Initialization, enabling more efficient learning of task-specific features while maintaining shared knowledge representation. Our approach not only improves performance in MTL but also reduces GPU memory usage and training time. Experiments show that R-LoRA's gains stem from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and ELM · Face and Expression Recognition · Text and Document Classification Technologies
MethodsDropout
