Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget
Johannes Lehner, Benedikt Alkin, Andreas F\"urst, Elisabeth, Rumetshofer, Lukas Miklautz, Sepp Hochreiter

TL;DR
This paper introduces MAE-CT, a contrastive tuning method that enhances masked autoencoders to produce more object-focused, semantically clustered features suitable for downstream classification tasks without extensive labeled data.
Contribution
The paper proposes MAE-CT, a novel sequential contrastive tuning approach that improves masked autoencoders by inducing semantic object clusters without labels, with minimal additional computation.
Findings
MAE-CT outperforms previous self-supervised methods in classification tasks.
Achieves state-of-the-art linear probing accuracy of 82.2% with ViT-H/16.
Requires only minimal data augmentations and 10% additional computation.
Abstract
Masked Image Modeling (MIM) methods, like Masked Autoencoders (MAE), efficiently learn a rich representation of the input. However, for adapting to downstream tasks, they require a sufficient amount of labeled data since their rich features code not only objects but also less relevant image background. In contrast, Instance Discrimination (ID) methods focus on objects. In this work, we study how to combine the efficiency and scalability of MIM with the ability of ID to perform downstream classification in the absence of large amounts of labeled data. To this end, we introduce Masked Autoencoder Contrastive Tuning (MAE-CT), a sequential approach that utilizes the implicit clustering of the Nearest Neighbor Contrastive Learning (NNCLR) objective to induce abstraction in the topmost layers of a pre-trained MAE. MAE-CT tunes the rich features such that they form semantic clusters of objects…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Masked autoencoder · Position-Wise Feed-Forward Layer · Label Smoothing · Dropout · Absolute Position Encodings · Residual Connection · k-Nearest Neighbors · Softmax
