DR-LoRA: Dynamic Rank LoRA for Fine-Tuning Mixture-of-Experts Models
Guanzhi Deng, Bo Li, Ronghao Chen, Xiujin Liu, Zhuo Han, Huacan Wang, Lijie Wen, Linqi Song

TL;DR
DR-LoRA introduces a dynamic, task-adaptive method for allocating different ranks to expert modules in MoE models, enhancing fine-tuning efficiency and performance.
Contribution
It proposes a novel framework that dynamically adjusts expert ranks based on saliency scores, addressing resource mismatch in MoE fine-tuning.
Findings
DR-LoRA outperforms LoRA and baselines across multiple tasks.
Task-adaptive rank allocation improves capacity utilization.
Experiments validate effectiveness on three MoE models.
Abstract
Mixture-of-Experts (MoE) has become a prominent paradigm for scaling Large Language Models (LLMs). Parameter-efficient fine-tuning methods, such as LoRA, are widely adopted to adapt pretrained MoE LLMs to downstream tasks. However, existing approaches typically assign identical LoRA ranks to all expert modules, ignoring the heterogeneous specialization of pretrained experts. This uniform allocation leads to a resource mismatch: task-relevant experts are under-provisioned, while less relevant ones receive redundant parameters. To address this, we propose DR-LoRA, a Dynamic Rank LoRA framework for fine-tuning pretrained MoE models. Specifically, DR-LoRA initializes all expert LoRA modules with a small active rank and uses an expert saliency score, which combines routing frequency and gradient-based rank importance, to identify which experts would benefit most from additional capacity. It…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
