Task-customized Masked AutoEncoder via Mixture of Cluster-conditional   Experts

Zhili Liu; Kai Chen; Jianhua Han; Lanqing Hong; Hang Xu; Zhenguo Li,; James T. Kwok

arXiv:2402.05382·cs.CV·February 9, 2024·6 cites

Task-customized Masked AutoEncoder via Mixture of Cluster-conditional Experts

Zhili Liu, Kai Chen, Jianhua Han, Lanqing Hong, Hang Xu, Zhenguo Li,, James T. Kwok

PDF

Open Access 1 Video

TL;DR

This paper introduces MoCE, a novel pre-training paradigm for Masked Autoencoders that creates task-specific models by leveraging cluster-conditional experts, improving transferability and performance across diverse downstream tasks.

Contribution

MoCE is the first to train a single MAE-based model with cluster-conditional experts, enabling customized pre-training for different downstream tasks.

Findings

01

MoCE outperforms vanilla MAE by 2.45% on average across 11 tasks.

02

Achieves state-of-the-art results on detection and segmentation.

03

Effective in reducing negative transfer from irrelevant pre-training data.

Abstract

Masked Autoencoder~(MAE) is a prevailing self-supervised learning method that achieves promising results in model pre-training. However, when the various downstream tasks have data distributions different from the pre-training data, the semantically irrelevant pre-training information might result in negative transfer, impeding MAE's scalability. To address this issue, we propose a novel MAE-based pre-training paradigm, Mixture of Cluster-conditional Experts (MoCE), which can be trained once but provides customized pre-training models for diverse downstream tasks. Different from the mixture of experts (MoE), our MoCE trains each expert only with semantically relevant images by using cluster-conditional gates. Thus, each downstream task can be allocated to its customized model pre-trained with data most similar to the downstream data. Experiments on a collection of 11 downstream tasks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Task-customized Masked Autoencoder via Mixture of Cluster-conditional Experts· slideslive

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Speech and dialogue systems · Context-Aware Activity Recognition Systems

MethodsMasked autoencoder