Domain Invariant Masked Autoencoders for Self-supervised Learning from Multi-domains
Haiyang Yang, Meilin Chen, Yizhou Wang, Shixiang Tang, Feng Zhu, Lei, Bai, Rui Zhao, Wanli Ouyang

TL;DR
This paper introduces DiMAE, a self-supervised learning method that learns domain-invariant features by augmenting images with style noise and reconstructing them, improving cross-domain generalization.
Contribution
It proposes a novel cross-domain reconstruction task with style augmentation and multiple decoders to enhance domain-invariant feature learning.
Findings
DiMAE outperforms recent state-of-the-art methods on PACS and DomainNet datasets.
The style mix augmentation preserves content while adding style diversity.
Multiple decoders effectively recover domain-specific styles for better reconstruction.
Abstract
Generalizing learned representations across significantly different visual domains is a fundamental yet crucial ability of the human visual system. While recent self-supervised learning methods have achieved good performances with evaluation set on the same domain as the training set, they will have an undesirable performance decrease when tested on a different domain. Therefore, the self-supervised learning from multiple domains task is proposed to learn domain-invariant features that are not only suitable for evaluation on the same domain as the training set but also can be generalized to unseen domains. In this paper, we propose a Domain-invariant Masked AutoEncoder (DiMAE) for self-supervised learning from multi-domains, which designs a new pretext task, \emph{i.e.,} the cross-domain reconstruction task, to learn domain-invariant features. The core idea is to augment the input image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning
