Understanding Masked Autoencoders via Hierarchical Latent Variable   Models

Lingjing Kong; Martin Q. Ma; Guangyi Chen; Eric P. Xing; Yuejie Chi,; Louis-Philippe Morency; Kun Zhang

arXiv:2306.04898·cs.LG·June 9, 2023·2 cites

Understanding Masked Autoencoders via Hierarchical Latent Variable Models

Lingjing Kong, Martin Q. Ma, Guangyi Chen, Eric P. Xing, Yuejie Chi,, Louis-Philippe Morency, Kun Zhang

PDF

Open Access 1 Repo

TL;DR

This paper provides a theoretical understanding of masked autoencoders (MAE) by modeling data with hierarchical latent variables, explaining how hyperparameters influence the semantic level of learned representations.

Contribution

It introduces a hierarchical latent variable model to theoretically justify MAE's ability to extract high-level features and analyzes the impact of hyperparameters on the learned representations.

Findings

01

MAE can identify latent variables under certain assumptions

02

Hyperparameters like masking ratio and patch size affect the semantic level of representations

03

Theoretical explanations align with empirical observations

Abstract

Masked autoencoder (MAE), a simple and effective self-supervised learning framework based on the reconstruction of masked image regions, has recently achieved prominent success in a variety of vision tasks. Despite the emergence of intriguing empirical observations on MAE, a theoretically principled understanding is still lacking. In this work, we formally characterize and justify existing empirical insights and provide theoretical guarantees of MAE. We formulate the underlying data-generating process as a hierarchical latent variable model and show that under reasonable assumptions, MAE provably identifies a set of latent variables in the hierarchical model, explaining why MAE can extract high-level information from pixels. Further, we show how key hyperparameters in MAE (the masking ratio and the patch size) determine which true latent variables to be recovered, therefore influencing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

martinmamql/mae_understand
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Image Processing Techniques and Applications

MethodsMasked autoencoder