TL;DR
PT-MoE introduces a novel parameter-efficient fine-tuning framework that combines matrix decomposition and mixture-of-experts routing, achieving state-of-the-art results across diverse NLP tasks with fewer parameters.
Contribution
It is the first to integrate matrix decomposition with MoE routing into prompt tuning, enhancing efficiency and performance across multiple tasks.
Findings
Achieves 1.49 F1 point improvement over PT in QA tasks.
Improves mathematical accuracy by 10.75 points over PT.
Uses 25% fewer parameters than LoRA.
Abstract
Parameter-efficient fine-tuning (PEFT) methods have shown promise in adapting large language models, yet existing approaches exhibit counter-intuitive phenomena: integrating router into prompt tuning (PT) increases training efficiency yet does not improve performance universally; parameter reduction through matrix decomposition can improve performance in specific domains. Motivated by these observations and the modular nature of PT, we propose PT-MoE, a novel framework that integrates matrix decomposition with mixture-of-experts (MoE) routing for efficient PT. Results across 17 datasets demonstrate that PT-MoE achieves state-of-the-art performance in both question answering (QA) and mathematical problem solving tasks, improving F1 score by 1.49 points over PT and 2.13 points over LoRA in QA tasks, while enhancing mathematical accuracy by 10.75 points over PT and 0.44 points over LoRA,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMixture of Experts
