Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for   Sparse Architectural Large Language Models

Zihan Wang; Deli Chen; Damai Dai; Runxin Xu; Zhuoshu Li; Y. Wu

arXiv:2407.01906·cs.CL·July 8, 2024

Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models

Zihan Wang, Deli Chen, Damai Dai, Runxin Xu, Zhuoshu Li, Y. Wu

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

This paper introduces Expert-Specialized Fine-Tuning (ESFT), a method for efficiently fine-tuning sparse-architecture LLMs with Mixture-of-Experts, achieving comparable or better performance than full fine-tuning by focusing on relevant experts.

Contribution

The paper proposes ESFT, a novel PEFT method for MoE-based LLMs that improves efficiency and performance by tuning only task-relevant experts while analyzing expert activation patterns.

Findings

01

ESFT improves tuning efficiency and performance.

02

Routing distribution varies across tasks, with concentrated expert activation.

03

Finer-grained experts in MoE models enhance expert selection and task adaptation.

Abstract

Parameter-efficient fine-tuning (PEFT) is crucial for customizing Large Language Models (LLMs) with constrained resources. Although there have been various PEFT methods for dense-architecture LLMs, PEFT for sparse-architecture LLMs is still underexplored. In this work, we study the PEFT method for LLMs with the Mixture-of-Experts (MoE) architecture and the contents of this work are mainly threefold: (1) We investigate the dispersion degree of the activated experts in customized tasks, and found that the routing distribution for a specific task tends to be highly concentrated, while the distribution of activated experts varies significantly across different tasks. (2) We propose Expert-Specialized Fine-Tuning, or ESFT, which tunes the experts most relevant to downstream tasks while freezing the other experts and modules; experimental results demonstrate that our method not only improves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

deepseek-ai/esft
pytorchOfficial

Models

🤗
deepseek-ai/ESFT-vanilla-lite
model· 409 dl· ♡ 19
409 dl♡ 19

Videos

Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models· underline

Taxonomy

TopicsTopic Modeling

MethodsMixture of Experts