Masked Image Modeling: A Survey

Vlad Hondru; Florinel Alin Croitoru; Shervin Minaee; Radu Tudor Ionescu; Nicu Sebe

arXiv:2408.06687·cs.CV·July 11, 2025

Masked Image Modeling: A Survey

Vlad Hondru, Florinel Alin Croitoru, Shervin Minaee, Radu Tudor Ionescu, Nicu Sebe

PDF

Open Access 1 Repo

TL;DR

This survey reviews recent masked image modeling techniques in computer vision, categorizing approaches, analyzing datasets, and highlighting future research directions in self-supervised learning.

Contribution

It provides a comprehensive taxonomy, performance comparison, and identifies research gaps in masked image modeling methods.

Findings

01

Two main categories of MIM: reconstruction and contrastive learning

02

A hierarchical clustering-based taxonomy of MIM approaches

03

Performance aggregation on popular datasets

Abstract

In this work, we survey recent studies on masked image modeling (MIM), an approach that emerged as a powerful self-supervised learning technique in computer vision. The MIM task involves masking some information, e.g. pixels, patches, or even latent representations, and training a model, usually an autoencoder, to predicting the missing information by using the context available in the visible part of the input. We identify and formalize two categories of approaches on how to implement MIM as a pretext task, one based on reconstruction and one based on contrastive learning. Then, we construct a taxonomy and review the most prominent papers in recent years. We complement the manually constructed taxonomy with a dendrogram obtained by applying a hierarchical clustering algorithm. We further identify relevant clusters via manually inspecting the resulting dendrogram. Our review also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vladhondru25/mim-survey
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Surveying and Cultural Heritage

MethodsMutual Information Machine/Mask Image Modeling