PERFT: Parameter-Efficient Routed Fine-Tuning for Mixture-of-Expert Model
Yilun Liu, Yunpu Ma, Shuo Chen, Zifeng Ding, Bailan He, Zhen Han,, Volker Tresp

TL;DR
This paper introduces PERFT, a flexible framework for parameter-efficient fine-tuning of Mixture-of-Experts models, enhancing scalability and effectiveness in reasoning tasks.
Contribution
It presents a unified, scalable PEFT framework tailored for MoE models, with extensive experiments demonstrating its effectiveness.
Findings
PERFT improves fine-tuning efficiency for MoE models.
Experimental results show PERFT's scalability and effectiveness.
Design choices in PERFT influence model performance.
Abstract
The Mixture-of-Experts (MoE) paradigm has emerged as a powerful approach for scaling transformers with improved resource utilization. However, efficiently fine-tuning MoE models remains largely underexplored. Inspired by recent works on Parameter-Efficient Fine-Tuning (PEFT), we present a unified framework for integrating PEFT modules directly into the MoE mechanism. Aligning with the core principles and architecture of MoE, our framework encompasses a set of design dimensions including various functional and composition strategies. By combining design choices within our framework, we introduce Parameter-Efficient Routed Fine-Tuning (PERFT) as a flexible and scalable family of PEFT strategies tailored for MoE models. Extensive experiments on adapting OLMoE-1B-7B and Mixtral-87B for commonsense and arithmetic reasoning tasks demonstrate the effectiveness, scalability, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Target Tracking and Data Fusion in Sensor Networks · Distributed Sensor Networks and Detection Algorithms
MethodsMixture of Experts · Sparse Evolutionary Training
