Learning with Unmasked Tokens Drives Stronger Vision Learners

Taekyung Kim; Sanghyuk Chun; Byeongho Heo; Dongyoon Han

arXiv:2310.13593·cs.CV·August 27, 2024·1 cites

Learning with Unmasked Tokens Drives Stronger Vision Learners

Taekyung Kim, Sanghyuk Chun, Byeongho Heo, Dongyoon Han

PDF

Open Access 1 Repo

TL;DR

This paper proposes a novel improvement to masked image modeling by incorporating unmasked tokens into training, leading to more discriminative representations and significant performance gains on various vision tasks.

Contribution

The authors introduce a method that explicitly uses unmasked tokens during MIM pre-training, enhancing context learning and resulting in stronger vision representations.

Findings

01

Achieved 84.2% top-1 accuracy on ImageNet-1K with ViT-B.

02

Improved performance on semantic segmentation and fine-grained classification.

03

Enhanced model robustness across diverse evaluation metrics.

Abstract

Masked image modeling (MIM) has become a leading self-supervised learning strategy. MIMs such as Masked Autoencoder (MAE) learn strong representations by randomly masking input tokens for the encoder to process, with the decoder reconstructing the masked tokens to the input. However, MIM pre-trained encoders often exhibit a limited attention span, attributed to MIM's sole focus on regressing masked tokens only, which may impede the encoder's broader context learning. To tackle the limitation, we improve MIM by explicitly incorporating unmasked tokens into the training process. Specifically, our method enables the encoder to learn from broader context supervision, allowing unmasked tokens to experience broader contexts while the decoder reconstructs masked tokens. Thus, the encoded unmasked tokens are equipped with extensive contextual information, empowering masked tokens to leverage…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

naver-ai/lut
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis

MethodsFocus · Mutual Information Machine/Mask Image Modeling