Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs
Jinguo Zhu, Xizhou Zhu, Wenhai Wang, Xiaohua Wang, Hongsheng Li,, Xiaogang Wang, Jifeng Dai

TL;DR
This paper introduces Conditional Mixture-of-Experts (MoEs) to improve the performance of generalist models like Uni-Perceiver by reducing task and modality interference, achieving state-of-the-art results with minimal downstream data.
Contribution
It proposes a novel Conditional MoEs approach to mitigate interference in generalist models, enhancing their performance and generalization across diverse tasks and modalities.
Findings
Achieves state-of-the-art results on multiple downstream tasks.
Effectively mitigates task and modality interference.
Maintains zero-shot generalization to new tasks.
Abstract
To build an artificial neural network like the biological intelligence system, recent works have unified numerous tasks into a generalist model, which can process various tasks with shared parameters and do not have any task-specific modules. While generalist models achieve promising results on various benchmarks, they have performance degradation on some tasks compared with task-specialized models. In this work, we find that interference among different tasks and modalities is the main factor to this phenomenon. To mitigate such interference, we introduce the Conditional Mixture-of-Experts (Conditional MoEs) to generalist models. Routing strategies under different levels of conditions are proposed to take both the training/inference cost and generalization ability into account. By incorporating the proposed Conditional MoEs, the recently proposed generalist model Uni-Perceiver can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
