MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture   of Shards

Sheng Wang; Liheng Chen; Pengan Chen; Jingwei Dong; Boyang Xue; Jiyue; Jiang; Lingpeng Kong; Chuan Wu

arXiv:2410.00938·cs.LG·February 18, 2025

MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards

Sheng Wang, Liheng Chen, Pengan Chen, Jingwei Dong, Boyang Xue, Jiyue, Jiang, Lingpeng Kong, Chuan Wu

PDF

Open Access

TL;DR

This paper introduces MoS, a novel parameter-efficient finetuning method for large language models that combines intra- and inter-layer sharing with differentiation strategies, achieving approximately 8x parameter savings over standard LoRA.

Contribution

MoS is a new method that enhances parameter efficiency in low-rank adaptation by integrating multiple sharing schemes and differentiation techniques, surpassing existing sharing approaches.

Findings

01

Achieves approximately 8x parameter savings compared to standard LoRA.

02

Effectively combines intra- and inter-layer sharing with differentiation strategies.

03

Demonstrates the importance of each component through ablation studies.

Abstract

The rapid scaling of large language models necessitates more lightweight finetuning methods to reduce the explosive GPU memory overhead when numerous customized models are served simultaneously. Targeting more parameter-efficient low-rank adaptation (LoRA), parameter sharing presents a promising solution. Empirically, our research into high-level sharing principles highlights the indispensable role of differentiation in reversing the detrimental effects of pure sharing. Guided by this finding, we propose Mixture of Shards (MoS), incorporating both inter-layer and intra-layer sharing schemes, and integrating four nearly cost-free differentiation strategies, namely subset selection, pair dissociation, vector sharding, and shard privatization. Briefly, it selects a designated number of shards from global pools with a Mixture-of-Experts (MoE)-like routing mechanism before sequentially…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM