SRMAE: Masked Image Modeling for Scale-Invariant Deep Representations

Zhiming Wang; Lin Gu; Feng Lu

arXiv:2308.08884·cs.CV·August 21, 2023

SRMAE: Masked Image Modeling for Scale-Invariant Deep Representations

Zhiming Wang, Lin Gu, Feng Lu

PDF

Open Access

TL;DR

SRMAE introduces a self-supervised masked image modeling approach using scale as a signal, leveraging super-resolution techniques to learn scale-invariant representations and achieve state-of-the-art results on low-resolution recognition tasks.

Contribution

The paper proposes a novel scale-aware masked autoencoder framework that incorporates super-resolution for improved scale-invariant visual representations.

Findings

01

Achieves 82.1% accuracy on ImageNet-1K after pre-training.

02

Surpasses existing methods in very low resolution recognition by 1.3%.

03

Outperforms state-of-the-art in low-resolution facial expression recognition by 9.48%.

Abstract

Due to the prevalence of scale variance in nature images, we propose to use image scale as a self-supervised signal for Masked Image Modeling (MIM). Our method involves selecting random patches from the input image and downsampling them to a low-resolution format. Our framework utilizes the latest advances in super-resolution (SR) to design the prediction head, which reconstructs the input from low-resolution clues and other patches. After 400 epochs of pre-training, our Super Resolution Masked Autoencoders (SRMAE) get an accuracy of 82.1% on the ImageNet-1K task. Image scale signal also allows our SRMAE to capture scale invariance representation. For the very low resolution (VLR) recognition task, our model achieves the best performance, surpassing DeriveNet by 1.3%. Our method also achieves an accuracy of 74.84% on the task of recognizing low-resolution facial expressions, surpassing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Processing Techniques · Face recognition and analysis · Generative Adversarial Networks and Image Synthesis