From Pixels to Components: Eigenvector Masking for Visual Representation Learning
Alice Bizeul, Thomas Sutter, Alain Ryser, Bernhard Sch\"olkopf, Julius, von K\"ugelgen, Julia E. Vogt

TL;DR
This paper introduces a novel masked image modeling approach that masks principal components instead of pixels, leading to better high-level feature learning and improved image classification performance.
Contribution
It proposes a PCA-based masking strategy that enhances the learning of global, high-level features over traditional pixel-based masking methods.
Findings
Improved image classification accuracy with component masking
Component masking captures more global information than pixel masking
Demonstrates robustness and simplicity of the PCA-based approach
Abstract
Predicting masked from visible parts of an image is a powerful self-supervised approach for visual representation learning. However, the common practice of masking random patches of pixels exhibits certain failure modes, which can prevent learning meaningful high-level features, as required for downstream tasks. We propose an alternative masking strategy that operates on a suitable transformation of the data rather than on the raw pixels. Specifically, we perform principal component analysis and then randomly mask a subset of components, which accounts for a fixed ratio of the data variance. The learning task then amounts to reconstructing the masked components from the visible ones. Compared to local patches of pixels, the principal components of images carry more global information. We thus posit that predicting masked from visible components involves more high-level features,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques
