MoE-Sieve: Routing-Guided LoRA for Efficient MoE Fine-Tuning

Andrea Manzoni

arXiv:2603.24044·cs.LG·March 26, 2026

MoE-Sieve: Routing-Guided LoRA for Efficient MoE Fine-Tuning

Andrea Manzoni

PDF

Open Access

TL;DR

MoE-Sieve introduces a routing-guided approach to fine-tune only the most active experts in MoE models, significantly reducing parameters and training time while maintaining competitive performance.

Contribution

The paper proposes MoE-Sieve, a simple routing-guided framework for LoRA fine-tuning that selectively adapts the most-routed experts, improving efficiency without sacrificing accuracy.

Findings

01

Selective expert tuning maintains performance within +/-1% of full LoRA.

02

Parameter and training time are reduced by over 70%.

03

Routing signal is crucial for effective expert selection.

Abstract

Standard LoRA fine-tuning of Mixture-of-Experts (MoE) models applies adapters to every expert, yet our profiling shows that per-layer expert routing is highly skewed: a small subset of experts handles most tokens in each layer, while many others are rarely activated ("cold"). We propose MoE-Sieve, a simple routing-guided framework for LoRA fine-tuning, and pair it with a systematic profiling study of expert routing across architectures and tasks. The method is simple: profile routing counts on a small calibration set, select the top-k most-routed experts per layer, and apply LoRA only to those experts. Across two architecturally distinct MoE models and three diverse tasks, tuning only the top 25% routed experts per layer remains competitive with full LoRA, with mean differences within +/-1 percentage point across all conditions. This reduces LoRA trainable parameters by 70-73%, adapter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Advanced Neural Network Applications · Stochastic Gradient Optimization Techniques