3-in-1: 2D Rotary Adaptation for Efficient Finetuning, Efficient   Batching and Composability

Baohao Liao; Christof Monz

arXiv:2409.00119·cs.LG·November 5, 2024

3-in-1: 2D Rotary Adaptation for Efficient Finetuning, Efficient Batching and Composability

Baohao Liao, Christof Monz

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces RoAd, a novel 2D rotation-based method for parameter-efficient fine-tuning of large language models that improves adaptability, batching efficiency, and interpretability with minimal parameter overhead.

Contribution

RoAd provides a simple 2D rotation technique that addresses multiple challenges in PEFT, including multi-adapter deployment, efficiency, and interpretability, with state-of-the-art performance.

Findings

01

Achieves high performance on GLUE and reasoning tasks with less than 0.1% trainable parameters.

02

Enables efficient batching for multiple adapters with minimal overhead.

03

Improves interpretability through distributed interchange intervention experiments.

Abstract

Parameter-efficient finetuning (PEFT) methods effectively adapt large language models (LLMs) to diverse downstream tasks, reducing storage and GPU memory demands. Despite these advantages, several applications pose new challenges to PEFT beyond mere parameter efficiency. One notable challenge involves the efficient deployment of LLMs equipped with multiple task- or user-specific adapters, particularly when different adapters are needed for distinct requests within the same batch. Another challenge is the interpretability of LLMs, which is crucial for understanding how LLMs function. Previous studies introduced various approaches to address different challenges. In this paper, we introduce a novel method, RoAd, which employs a straightforward 2D rotation to adapt LLMs and addresses all the above challenges: (1) RoAd is remarkably parameter-efficient, delivering optimal performance on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

baohaoliao/road
noneOfficial

Videos

3-in-1: 2D Rotary Adaptation for Efficient Finetuning, Efficient Batching and Composability· slideslive

Taxonomy

TopicsModular Robots and Swarm Intelligence