MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards
Sheng Wang, Liheng Chen, Pengan Chen, Jingwei Dong, Boyang Xue, Jiyue, Jiang, Lingpeng Kong, Chuan Wu

TL;DR
This paper introduces MoS, a novel parameter-efficient finetuning method for large language models that combines intra- and inter-layer sharing with differentiation strategies, achieving approximately 8x parameter savings over standard LoRA.
Contribution
MoS is a new method that enhances parameter efficiency in low-rank adaptation by integrating multiple sharing schemes and differentiation techniques, surpassing existing sharing approaches.
Findings
Achieves approximately 8x parameter savings compared to standard LoRA.
Effectively combines intra- and inter-layer sharing with differentiation strategies.
Demonstrates the importance of each component through ablation studies.
Abstract
The rapid scaling of large language models necessitates more lightweight finetuning methods to reduce the explosive GPU memory overhead when numerous customized models are served simultaneously. Targeting more parameter-efficient low-rank adaptation (LoRA), parameter sharing presents a promising solution. Empirically, our research into high-level sharing principles highlights the indispensable role of differentiation in reversing the detrimental effects of pure sharing. Guided by this finding, we propose Mixture of Shards (MoS), incorporating both inter-layer and intra-layer sharing schemes, and integrating four nearly cost-free differentiation strategies, namely subset selection, pair dissociation, vector sharding, and shard privatization. Briefly, it selects a designated number of shards from global pools with a Mixture-of-Experts (MoE)-like routing mechanism before sequentially…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM
