DailyMAE: Towards Pretraining Masked Autoencoders in One Day

Jiantao Wu; Shentong Mo; Sara Atito; Zhenhua Feng; Josef Kittler,; Muhammad Awais

arXiv:2404.00509·cs.LG·April 2, 2024·1 cites

DailyMAE: Towards Pretraining Masked Autoencoders in One Day

Jiantao Wu, Shentong Mo, Sara Atito, Zhenhua Feng, Josef Kittler,, Muhammad Awais

PDF

Open Access 1 Repo

TL;DR

This paper introduces efficient training methods for masked autoencoders in self-supervised learning, enabling high-performance pretraining on ImageNet within 18 hours using accessible hardware, thus making SSL research more feasible and faster.

Contribution

The authors present optimized training recipes and techniques that significantly reduce pretraining time for masked autoencoders without sacrificing performance.

Findings

01

Achieved 5.8x speedup in pretraining time

02

Trained MAE-Base/16 on ImageNet 1K in 18 hours

03

Enabled high-efficiency SSL training on a single machine

Abstract

Recently, masked image modeling (MIM), an important self-supervised learning (SSL) method, has drawn attention for its effectiveness in learning data representation from unlabeled data. Numerous studies underscore the advantages of MIM, highlighting how models pretrained on extensive datasets can enhance the performance of downstream tasks. However, the high computational demands of pretraining pose significant challenges, particularly within academic environments, thereby impeding the SSL research progress. In this study, we propose efficient training recipes for MIM based SSL that focuses on mitigating data loading bottlenecks and employing progressive training techniques and other tricks to closely maintain pretraining performance. Our library enables the training of a MAE-Base/16 model on the ImageNet 1K dataset for 800 epochs within just 18 hours, using a single machine equipped…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

erow/fastssl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Lib · Mutual Information Machine/Mask Image Modeling