3-in-1: 2D Rotary Adaptation for Efficient Finetuning, Efficient Batching and Composability
Baohao Liao, Christof Monz

TL;DR
This paper introduces RoAd, a novel 2D rotation-based method for parameter-efficient fine-tuning of large language models that improves adaptability, batching efficiency, and interpretability with minimal parameter overhead.
Contribution
RoAd provides a simple 2D rotation technique that addresses multiple challenges in PEFT, including multi-adapter deployment, efficiency, and interpretability, with state-of-the-art performance.
Findings
Achieves high performance on GLUE and reasoning tasks with less than 0.1% trainable parameters.
Enables efficient batching for multiple adapters with minimal overhead.
Improves interpretability through distributed interchange intervention experiments.
Abstract
Parameter-efficient finetuning (PEFT) methods effectively adapt large language models (LLMs) to diverse downstream tasks, reducing storage and GPU memory demands. Despite these advantages, several applications pose new challenges to PEFT beyond mere parameter efficiency. One notable challenge involves the efficient deployment of LLMs equipped with multiple task- or user-specific adapters, particularly when different adapters are needed for distinct requests within the same batch. Another challenge is the interpretability of LLMs, which is crucial for understanding how LLMs function. Previous studies introduced various approaches to address different challenges. In this paper, we introduce a novel method, RoAd, which employs a straightforward 2D rotation to adapt LLMs and addresses all the above challenges: (1) RoAd is remarkably parameter-efficient, delivering optimal performance on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsModular Robots and Swarm Intelligence
