Task-customized Masked AutoEncoder via Mixture of Cluster-conditional Experts
Zhili Liu, Kai Chen, Jianhua Han, Lanqing Hong, Hang Xu, Zhenguo Li,, James T. Kwok

TL;DR
This paper introduces MoCE, a novel pre-training paradigm for Masked Autoencoders that creates task-specific models by leveraging cluster-conditional experts, improving transferability and performance across diverse downstream tasks.
Contribution
MoCE is the first to train a single MAE-based model with cluster-conditional experts, enabling customized pre-training for different downstream tasks.
Findings
MoCE outperforms vanilla MAE by 2.45% on average across 11 tasks.
Achieves state-of-the-art results on detection and segmentation.
Effective in reducing negative transfer from irrelevant pre-training data.
Abstract
Masked Autoencoder~(MAE) is a prevailing self-supervised learning method that achieves promising results in model pre-training. However, when the various downstream tasks have data distributions different from the pre-training data, the semantically irrelevant pre-training information might result in negative transfer, impeding MAE's scalability. To address this issue, we propose a novel MAE-based pre-training paradigm, Mixture of Cluster-conditional Experts (MoCE), which can be trained once but provides customized pre-training models for diverse downstream tasks. Different from the mixture of experts (MoE), our MoCE trains each expert only with semantically relevant images by using cluster-conditional gates. Thus, each downstream task can be allocated to its customized model pre-trained with data most similar to the downstream data. Experiments on a collection of 11 downstream tasks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Speech and dialogue systems · Context-Aware Activity Recognition Systems
MethodsMasked autoencoder
