Expert Pyramid Tuning: Efficient Parameter Fine-Tuning for Expertise-Driven Task Allocation
Jia-Chen Zhang, Zhen-Wei Yan, Yu-Jie Xiong, Chun-Ming Xia

TL;DR
Expert Pyramid Tuning introduces a hierarchical, multi-scale approach to parameter-efficient fine-tuning of large language models, improving task adaptation by capturing diverse feature granularities and reducing training parameters.
Contribution
The paper proposes Expert Pyramid Tuning, a novel hierarchical architecture that integrates multi-scale feature pyramids into PEFT, outperforming existing MoE-LoRA methods while reducing parameters.
Findings
EPT outperforms SOTA MoE-LoRA variants on multiple benchmarks.
EPT reduces the number of training parameters compared to standard methods.
The hierarchical design captures diverse feature granularities effectively.
Abstract
Parameter-Efficient Fine-Tuning (PEFT) has become a dominant paradigm for deploying LLMs in multi-task scenarios due to its extreme parameter efficiency. While Mixture-of-Experts (MoE) based LoRA variants have achieved promising results by dynamically routing tokens to different low-rank experts, they largely overlook the hierarchical nature of task complexity. Existing methods typically employ experts with uniform architectures, limiting their ability to capture diverse feature granularities required by distinct tasks--where some tasks demand high-level semantic abstraction while others require fine-grained syntactic manipulation. To bridge this gap, we propose Expert Pyramid Tuning (EPT), a novel architecture that integrates the multi-scale feature pyramid concept from computer vision into the realm of PEFT. Unlike standard LoRA, EPT decomposes task adaptation into two stages: (1) A…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications
