Understanding Masked Autoencoders via Hierarchical Latent Variable Models
Lingjing Kong, Martin Q. Ma, Guangyi Chen, Eric P. Xing, Yuejie Chi,, Louis-Philippe Morency, Kun Zhang

TL;DR
This paper provides a theoretical understanding of masked autoencoders (MAE) by modeling data with hierarchical latent variables, explaining how hyperparameters influence the semantic level of learned representations.
Contribution
It introduces a hierarchical latent variable model to theoretically justify MAE's ability to extract high-level features and analyzes the impact of hyperparameters on the learned representations.
Findings
MAE can identify latent variables under certain assumptions
Hyperparameters like masking ratio and patch size affect the semantic level of representations
Theoretical explanations align with empirical observations
Abstract
Masked autoencoder (MAE), a simple and effective self-supervised learning framework based on the reconstruction of masked image regions, has recently achieved prominent success in a variety of vision tasks. Despite the emergence of intriguing empirical observations on MAE, a theoretically principled understanding is still lacking. In this work, we formally characterize and justify existing empirical insights and provide theoretical guarantees of MAE. We formulate the underlying data-generating process as a hierarchical latent variable model and show that under reasonable assumptions, MAE provably identifies a set of latent variables in the hierarchical model, explaining why MAE can extract high-level information from pixels. Further, we show how key hyperparameters in MAE (the masking ratio and the patch size) determine which true latent variables to be recovered, therefore influencing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Image Processing Techniques and Applications
MethodsMasked autoencoder
