The Devil is in the Frequency: Geminated Gestalt Autoencoder for   Self-Supervised Visual Pre-Training

Hao Liu; Xinghua Jiang; Xin Li; Antai Guo; Deqiang Jiang; Bo Ren

arXiv:2204.08227·cs.CV·April 19, 2022·1 cites

The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training

Hao Liu, Xinghua Jiang, Xin Li, Antai Guo, Deqiang Jiang, Bo Ren

PDF

Open Access 1 Video

TL;DR

This paper introduces Ge$^2$-AE, a novel self-supervised visual pre-training method that reconstructs images in both pixel and frequency domains using dual decoders, leading to more robust representations.

Contribution

It proposes the first MIM approach utilizing frequency domain reconstruction with geminated decoders for improved visual representation learning.

Findings

01

Enhanced downstream recognition performance.

02

Robustness of learned representations confirmed.

03

Effective in both quantitative and qualitative evaluations.

Abstract

The self-supervised Masked Image Modeling (MIM) schema, following "mask-and-reconstruct" pipeline of recovering contents from masked image, has recently captured the increasing interest in the multimedia community, owing to the excellent ability of learning visual representation from unlabeled data. Aiming at learning representations with high semantics abstracted, a group of works attempts to reconstruct non-semantic pixels with large-ratio masking strategy, which may suffer from "over-smoothing" problem, while others directly infuse semantics into targets in off-line way requiring extra data. Different from them, we shift the perspective to the Fourier domain which naturally has global perspective and present a new Masked Image Modeling (MIM), termed Geminated Gestalt Autoencoder (Ge $^{2}$ -AE) for visual pre-training. Specifically, we equip our model with geminated decoders in charge of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training· underline

Taxonomy

TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Advanced Image and Video Retrieval Techniques