Masked Frequency Modeling for Self-Supervised Visual Pre-Training

Jiahao Xie; Wei Li; Xiaohang Zhan; Ziwei Liu; Yew Soon Ong; Chen; Change Loy

arXiv:2206.07706·cs.CV·April 26, 2023·29 cites

Masked Frequency Modeling for Self-Supervised Visual Pre-Training

Jiahao Xie, Wei Li, Xiaohang Zhan, Ziwei Liu, Yew Soon Ong, Chen, Change Loy

PDF

Open Access 3 Repos 1 Video

TL;DR

This paper introduces Masked Frequency Modeling (MFM), a novel self-supervised pre-training method that masks and predicts frequency components of images, leading to improved visual representations without extra data or model complexity.

Contribution

MFM is the first to apply frequency domain masking for self-supervised visual pre-training, demonstrating its effectiveness across various models and tasks.

Findings

01

MFM achieves competitive image classification and segmentation results.

02

MFM enhances robustness against various image corruptions.

03

Frequency-based masking reveals underlying image patterns more effectively.

Abstract

We present Masked Frequency Modeling (MFM), a unified frequency-domain-based approach for self-supervised pre-training of visual models. Instead of randomly inserting mask tokens to the input embeddings in the spatial domain, in this paper, we shift the perspective to the frequency domain. Specifically, MFM first masks out a portion of frequency components of the input image and then predicts the missing frequencies on the frequency spectrum. Our key insight is that predicting masked components in the frequency domain is more ideal to reveal underlying image patterns rather than predicting masked patches in the spatial domain, due to the heavy spatial redundancy. Our findings suggest that with the right configuration of mask-and-predict strategy, both the structural information within high-frequency components and the low-level statistics among low-frequency counterparts are useful in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Masked Frequency Modeling for Self-Supervised Visual Pre-Training· slideslive

Taxonomy

TopicsImage Processing Techniques and Applications · Optical measurement and interference techniques · Domain Adaptation and Few-Shot Learning