PERFT: Parameter-Efficient Routed Fine-Tuning for Mixture-of-Expert   Model

Yilun Liu; Yunpu Ma; Shuo Chen; Zifeng Ding; Bailan He; Zhen Han,; Volker Tresp

arXiv:2411.08212·cs.LG·November 14, 2024

PERFT: Parameter-Efficient Routed Fine-Tuning for Mixture-of-Expert Model

Yilun Liu, Yunpu Ma, Shuo Chen, Zifeng Ding, Bailan He, Zhen Han,, Volker Tresp

PDF

Open Access

TL;DR

This paper introduces PERFT, a flexible framework for parameter-efficient fine-tuning of Mixture-of-Experts models, enhancing scalability and effectiveness in reasoning tasks.

Contribution

It presents a unified, scalable PEFT framework tailored for MoE models, with extensive experiments demonstrating its effectiveness.

Findings

01

PERFT improves fine-tuning efficiency for MoE models.

02

Experimental results show PERFT's scalability and effectiveness.

03

Design choices in PERFT influence model performance.

Abstract

The Mixture-of-Experts (MoE) paradigm has emerged as a powerful approach for scaling transformers with improved resource utilization. However, efficiently fine-tuning MoE models remains largely underexplored. Inspired by recent works on Parameter-Efficient Fine-Tuning (PEFT), we present a unified framework for integrating PEFT modules directly into the MoE mechanism. Aligning with the core principles and architecture of MoE, our framework encompasses a set of design dimensions including various functional and composition strategies. By combining design choices within our framework, we introduce Parameter-Efficient Routed Fine-Tuning (PERFT) as a flexible and scalable family of PEFT strategies tailored for MoE models. Extensive experiments on adapting OLMoE-1B-7B and Mixtral-8 $\times$ 7B for commonsense and arithmetic reasoning tasks demonstrate the effectiveness, scalability, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Target Tracking and Data Fusion in Sensor Networks · Distributed Sensor Networks and Detection Algorithms

MethodsMixture of Experts · Sparse Evolutionary Training