Improving Visual Representation Learning through Perceptual Understanding
Samyakh Tukra, Frederick Hoffman, Ken Chatfield

TL;DR
This paper introduces Perceptual MAE, an extension of masked autoencoders that enhances image representations by incorporating perceptual similarity and adversarial training techniques, leading to improved performance on downstream tasks.
Contribution
It proposes a novel extension to MAE that explicitly encourages learning of higher scene-level features using perceptual similarity and adversarial training methods.
Findings
Achieves 78.1% top-1 accuracy on ImageNet-1K with linear probing.
Reaches up to 88.1% accuracy with fine-tuning.
Outperforms previous methods on various downstream tasks.
Abstract
We present an extension to masked autoencoders (MAE) which improves on the representations learnt by the model by explicitly encouraging the learning of higher scene-level features. We do this by: (i) the introduction of a perceptual similarity term between generated and real images (ii) incorporating several techniques from the adversarial training literature including multi-scale training and adaptive discriminator augmentation. The combination of these results in not only better pixel reconstruction but also representations which appear to capture better higher-level details within images. More consequentially, we show how our method, Perceptual MAE, leads to better performance when used for downstream tasks outperforming previous methods. We achieve 78.1% top-1 accuracy linear probing on ImageNet-1K and up to 88.1% when fine-tuning, with similar results for other downstream tasks,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI
MethodsMasked autoencoder
