Multi-Task Dense Prediction Fine-Tuning with Mixture of Fine-Grained Experts

Yangyang Xu; Xi Ye; and Duo Su

arXiv:2507.19077·cs.CV·July 28, 2025

Multi-Task Dense Prediction Fine-Tuning with Mixture of Fine-Grained Experts

Yangyang Xu, Xi Ye, and Duo Su

PDF

TL;DR

This paper introduces a novel Fine-Grained Mixture of Experts architecture for multi-task dense prediction, improving parameter efficiency and performance by decomposing task information and facilitating adaptive knowledge transfer.

Contribution

The paper presents a new FGMoE model with intra-task, shared, and global experts, enhancing multi-task learning for dense prediction with better parameter efficiency and task-specific adaptation.

Findings

01

FGMoE outperforms existing MoE-based models on NYUD-v2 and PASCAL-Context datasets.

02

FGMoE uses fewer parameters while achieving higher accuracy.

03

The architecture effectively balances shared and task-specific information.

Abstract

Multi-task learning (MTL) for dense prediction has shown promising results but still faces challenges in balancing shared representations with task-specific specialization. In this paper, we introduce a novel Fine-Grained Mixture of Experts (FGMoE) architecture that explores MoE-based MTL models through a combination of three key innovations and fine-tuning. First, we propose intra-task experts that partition along intermediate hidden dimensions of MLPs, enabling finer decomposition of task information while maintaining parameter efficiency. Second, we introduce shared experts that consolidate common information across different contexts of the same task, reducing redundancy, and allowing routing experts to focus on unique aspects. Third, we design a global expert that facilitates adaptive knowledge transfer across tasks based on both input feature and task requirements, promoting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.