Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models
Zihan Wang, Deli Chen, Damai Dai, Runxin Xu, Zhuoshu Li, Y. Wu

TL;DR
This paper introduces Expert-Specialized Fine-Tuning (ESFT), a method for efficiently fine-tuning sparse-architecture LLMs with Mixture-of-Experts, achieving comparable or better performance than full fine-tuning by focusing on relevant experts.
Contribution
The paper proposes ESFT, a novel PEFT method for MoE-based LLMs that improves efficiency and performance by tuning only task-relevant experts while analyzing expert activation patterns.
Findings
ESFT improves tuning efficiency and performance.
Routing distribution varies across tasks, with concentrated expert activation.
Finer-grained experts in MoE models enhance expert selection and task adaptation.
Abstract
Parameter-efficient fine-tuning (PEFT) is crucial for customizing Large Language Models (LLMs) with constrained resources. Although there have been various PEFT methods for dense-architecture LLMs, PEFT for sparse-architecture LLMs is still underexplored. In this work, we study the PEFT method for LLMs with the Mixture-of-Experts (MoE) architecture and the contents of this work are mainly threefold: (1) We investigate the dispersion degree of the activated experts in customized tasks, and found that the routing distribution for a specific task tends to be highly concentrated, while the distribution of activated experts varies significantly across different tasks. (2) We propose Expert-Specialized Fine-Tuning, or ESFT, which tunes the experts most relevant to downstream tasks while freezing the other experts and modules; experimental results demonstrate that our method not only improves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling
MethodsMixture of Experts
