Multi-Task Dense Prediction Fine-Tuning with Mixture of Fine-Grained Experts
Yangyang Xu, Xi Ye, and Duo Su

TL;DR
This paper introduces a novel Fine-Grained Mixture of Experts architecture for multi-task dense prediction, improving parameter efficiency and performance by decomposing task information and facilitating adaptive knowledge transfer.
Contribution
The paper presents a new FGMoE model with intra-task, shared, and global experts, enhancing multi-task learning for dense prediction with better parameter efficiency and task-specific adaptation.
Findings
FGMoE outperforms existing MoE-based models on NYUD-v2 and PASCAL-Context datasets.
FGMoE uses fewer parameters while achieving higher accuracy.
The architecture effectively balances shared and task-specific information.
Abstract
Multi-task learning (MTL) for dense prediction has shown promising results but still faces challenges in balancing shared representations with task-specific specialization. In this paper, we introduce a novel Fine-Grained Mixture of Experts (FGMoE) architecture that explores MoE-based MTL models through a combination of three key innovations and fine-tuning. First, we propose intra-task experts that partition along intermediate hidden dimensions of MLPs, enabling finer decomposition of task information while maintaining parameter efficiency. Second, we introduce shared experts that consolidate common information across different contexts of the same task, reducing redundancy, and allowing routing experts to focus on unique aspects. Third, we design a global expert that facilitates adaptive knowledge transfer across tasks based on both input feature and task requirements, promoting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
