Suppressing Non-Semantic Noise in Masked Image Modeling Representations
Martine Hjelkrem-Tan, Marius Aasan, Rwiddhi Chakraborty, Gabriel Y. Arteaga, Changkyu Choi, Ad\'in Ram\'irez Rivera

TL;DR
This paper identifies non-semantic information retention in Masked Image Modeling representations and introduces SOAP, a simple post-hoc method to suppress it, improving zero-shot performance across models.
Contribution
The paper proposes a model-agnostic PCA-based score and a novel suppression method, SOAP, to enhance semantic invariance in MIM representations without additional training.
Findings
SOAP improves zero-shot performance across various MIM models.
The PCA-based score effectively measures semantic invariance.
SOAP is a simple, model-agnostic, post-hoc linear method.
Abstract
Masked Image Modeling (MIM) has become a ubiquitous self-supervised vision paradigm. In this work, we show that MIM objectives cause the learned representations to retain non-semantic information, which ultimately hurts performance during inference. We introduce a model-agnostic score for semantic invariance using Principal Component Analysis (PCA) on real and synthetic non-semantic images. Based on this score, we propose a simple method, Semantically Orthogonal Artifact Projection (SOAP), to directly suppress non-semantic information in patch representations, leading to consistent improvements in zero-shot performance across various MIM-based models. SOAP is a post-hoc suppression method, requires zero training, and can be attached to any model as a single linear head.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
