Adaptive and Fine-grained Module-wise Expert Pruning for Efficient LoRA-MoE Fine-Tuning
Weihang Li, Jianchun Liu, Hongli Xu

TL;DR
DMEP introduces dynamic expert pruning in LoRA-MoE fine-tuning, reducing parameters and improving training efficiency by tailoring expert structures to module needs without sacrificing accuracy.
Contribution
It proposes a novel framework that adaptively prunes experts per module based on utility, enhancing parameter efficiency and specialization in LoRA-MoE models.
Findings
Reduces trainable parameters by 35-43%
Improves training throughput by about 10%
Maintains or surpasses baseline reasoning accuracy
Abstract
LoRA-MoE has emerged as an effective paradigm for parameter-efficient fine-tuning, combining the low training cost of LoRA with the increased adaptation capacity of Mixture-of-Experts (MoE). However, existing LoRA-MoE frameworks typically adopt a fixed and uniform expert configuration across heterogeneous Transformer modules (\eg, attention query/key projections and MLP gating networks), ignoring their distinct functional roles and capacity requirements. This design leads to localized over-provisioning, redundant trainable parameters, and unnecessary optimizer-state overhead. Moreover, prior methods enforce load balancing among experts throughout training. Although beneficial in the early stage, this constraint becomes restrictive once routing patterns stabilize, limiting expert specialization on downstream tasks. In this paper, we propose DMEP, a novel LoRA-MoE fine-tuning framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
