pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation

Shentong Mo; Xufang Luo; Dongsheng Li

arXiv:2602.22938·cs.CV·February 27, 2026

pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation

Shentong Mo, Xufang Luo, Dongsheng Li

PDF

Open Access 1 Video 3 Reviews

TL;DR

The paper introduces pMoE, a novel prompt tuning method that combines multiple domain experts through dynamic dispatching, significantly improving visual adaptation performance across diverse tasks while maintaining computational efficiency.

Contribution

pMoE is the first to integrate multiple domain-specific experts via prompt tokens and dynamic dispatching, enhancing versatility and performance in visual adaptation tasks.

Findings

01

Achieves superior performance across 47 tasks.

02

Outperforms existing methods with significant margins.

03

Balances computational efficiency with adaptation effectiveness.

Abstract

Parameter-efficient fine-tuning has demonstrated promising results across various visual adaptation tasks, such as classification and segmentation. Typically, prompt tuning techniques have harnessed knowledge from a single pre-trained model, whether from a general or a specialized medical domain. However, this approach typically overlooks the potential synergies that could arise from integrating diverse domain knowledge within the same tuning process. In this work, we propose a novel Mixture-of-Experts prompt tuning method called pMoE, which leverages the strengths of multiple expert domains through expert-specialized prompt tokens and the learnable dispatcher, effectively combining their expertise in a unified model framework. Our pMoE introduces expert-specific prompt tokens and utilizes a dynamic token dispatching mechanism at various prompt layers to optimize the contribution of…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 4

Strengths

**Methodology**: - Ideas of using a mixture of experts to combine different kinds of prompt tuning are interesting ideas. - The author also proposes using dynamic gating to selectively choose and integrate tokens from various experts, which is essential to (1) allow communication among experts for better information exchange and (2) determine which information should be retained. **Experiments**: - Reviewers appreciate the efforts of the paper to conduct several experiments and compare with sev

Weaknesses

The reviewer found the following major weaknesses in this paper: **Methodology**: - The writing part for Section 3.3 is very convoluted and is difficult to grasp ideas beyond. For e.g., the following sentence is *These tokens are the added EPTs for all experts $P^{l}$, accumulated prompts from the last layer $Z_{P, expert_{k}}^{l}$, and patch tokens of the current expert $Z_{expert_{k}}^{l}$*. So the question is, what is **these** here? and what is exactly the equation for using $\hat{P}_{exper

Reviewer 02Rating 6Confidence 3

Strengths

1. Proposed pMoE that effectively integrates knowledge from multiple domain experts using expert-specific prompt tokens and a dynamic dispatcher. 2. Demonstrated strong results in both general and medical domains, showing the method's adaptability across different fields and task types.

Weaknesses

1. While the results are comprehensive, the paper lacks visualizations that could help explain the inner workings of the **Dynamic Dispatcher**. For example, a more detailed breakdown of how the dispatcher allocates weights across different experts in specific tasks would make the method’s dynamics clearer. 2. Although the paper performs ablation studies on the number of experts (as shown in Table 12), it would be beneficial to see a more in-depth analysis of **how different domains influence ea

Reviewer 03Rating 8Confidence 3

Strengths

- Proposed a novel prompt tuning framework that can unify diverse domain-specific models, enhances the model’s versatility with computational efficiency. - Extensive experiments were conducted and had good results. - The paper is well-written and easy to follow.

Weaknesses

- I see that this prompt learning method is trained on A100-80GB GPUs, which I am afraid of being not available in many places. Can this method be trained on smaller systems ?

Videos

pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Face recognition and analysis