PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning

Zongqian Li; Yixuan Su; Nigel Collier

arXiv:2505.09519·cs.CL·May 15, 2025

PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning

Zongqian Li, Yixuan Su, Nigel Collier

PDF

1 Repo

TL;DR

PT-MoE introduces a novel parameter-efficient fine-tuning framework that combines matrix decomposition and mixture-of-experts routing, achieving state-of-the-art results across diverse NLP tasks with fewer parameters.

Contribution

It is the first to integrate matrix decomposition with MoE routing into prompt tuning, enhancing efficiency and performance across multiple tasks.

Findings

01

Achieves 1.49 F1 point improvement over PT in QA tasks.

02

Improves mathematical accuracy by 10.75 points over PT.

03

Uses 25% fewer parameters than LoRA.

Abstract

Parameter-efficient fine-tuning (PEFT) methods have shown promise in adapting large language models, yet existing approaches exhibit counter-intuitive phenomena: integrating router into prompt tuning (PT) increases training efficiency yet does not improve performance universally; parameter reduction through matrix decomposition can improve performance in specific domains. Motivated by these observations and the modular nature of PT, we propose PT-MoE, a novel framework that integrates matrix decomposition with mixture-of-experts (MoE) routing for efficient PT. Results across 17 datasets demonstrate that PT-MoE achieves state-of-the-art performance in both question answering (QA) and mathematical problem solving tasks, improving F1 score by 1.49 points over PT and 2.13 points over LoRA in QA tasks, while enhancing mathematical accuracy by 10.75 points over PT and 0.44 points over LoRA,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zongqianli/pt-moe
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMixture of Experts