SiRA: Sparse Mixture of Low Rank Adaptation

Yun Zhu; Nevan Wichers; Chu-Cheng Lin; Xinyi Wang; Tianlong Chen; Lei; Shu; Han Lu; Canoee Liu; Liangchen Luo; Jindong Chen; Lei Meng

arXiv:2311.09179·cs.CL·November 16, 2023·1 cites

SiRA: Sparse Mixture of Low Rank Adaptation

Yun Zhu, Nevan Wichers, Chu-Cheng Lin, Xinyi Wang, Tianlong Chen, Lei, Shu, Han Lu, Canoee Liu, Liangchen Luo, Jindong Chen, Lei Meng

PDF

Open Access

TL;DR

SiRA introduces a sparse mixture of low-rank adaptation that leverages sparse expert routing and expert dropout to improve parameter-efficient tuning of large language models, outperforming previous dense methods like LoRA.

Contribution

The paper proposes SiRA, a novel sparse mixture of experts approach with expert dropout, enhancing low-rank adaptation for better performance in language model tuning.

Findings

01

SiRA outperforms LoRA and other mixture of expert methods in various tasks.

02

Sparse expert routing with capacity limits improves model efficiency.

03

Expert dropout reduces overfitting in the proposed method.

Abstract

Parameter Efficient Tuning has been an prominent approach to adapt the Large Language Model to downstream tasks. Most previous works considers adding the dense trainable parameters, where all parameters are used to adapt certain task. We found this less effective empirically using the example of LoRA that introducing more trainable parameters does not help. Motivated by this we investigate the importance of leveraging "sparse" computation and propose SiRA: sparse mixture of low rank adaption. SiRA leverages the Sparse Mixture of Expert(SMoE) to boost the performance of LoRA. Specifically it enforces the top $k$ experts routing with a capacity limit restricting the maximum number of tokens each expert can process. We propose a novel and simple expert dropout on top of gating network to reduce the over-fitting issue. Through extensive experiments, we verify SiRA performs better than LoRA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications

MethodsDropout