Efficient Image Pre-Training with Siamese Cropped Masked Autoencoders

Alexandre Eyma\"el; Renaud Vandeghen; Anthony Cioppa; Silvio Giancola,; Bernard Ghanem; Marc Van Droogenbroeck

arXiv:2403.17823·cs.CV·February 18, 2025·3 cites

Efficient Image Pre-Training with Siamese Cropped Masked Autoencoders

Alexandre Eyma\"el, Renaud Vandeghen, Anthony Cioppa, Silvio Giancola,, Bernard Ghanem, Marc Van Droogenbroeck

PDF

Open Access 1 Repo

TL;DR

CropMAE introduces a novel self-supervised image pre-training method using cropped image pairs from the same image, achieving high masking ratios and learning object-centric representations without video data or explicit motion cues.

Contribution

It proposes CropMAE, a new pre-training approach that reduces reliance on video datasets and explicit motion, while maintaining competitive performance and enabling higher masking ratios.

Findings

01

CropMAE achieves the highest masking ratio to date (98.5%).

02

It learns object-centric representations without explicit motion.

03

It reduces pre-training and learning time significantly.

Abstract

Self-supervised pre-training of image encoders is omnipresent in the literature, particularly following the introduction of Masked autoencoders (MAE). Current efforts attempt to learn object-centric representations from motion in videos. In particular, SiamMAE recently introduced a Siamese network, training a shared-weight encoder from two frames of a video with a high asymmetric masking ratio (95%). In this work, we propose CropMAE, an alternative approach to the Siamese pre-training introduced by SiamMAE. Our method specifically differs by exclusively considering pairs of cropped images sourced from the same image but cropped differently, deviating from the conventional pairs of frames extracted from a video. CropMAE therefore alleviates the need for video datasets, while maintaining competitive performances and drastically reducing pre-training and learning time. Furthermore, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alexandre-eymael/cropmae
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis