Pre-training Vision Transformers with Very Limited Synthesized Images

Ryo Nakamura; Hirokatsu Kataoka; Sora Takashima; Edgar Josafat; Martinez Noriega; Rio Yokota; Nakamasa Inoue

arXiv:2307.14710·cs.CV·August 1, 2023

Pre-training Vision Transformers with Very Limited Synthesized Images

Ryo Nakamura, Hirokatsu Kataoka, Sora Takashima, Edgar Josafat, Martinez Noriega, Rio Yokota, Nakamasa Inoue

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that pre-training vision transformers with a single synthetic image per category, using a fractal-based dataset, can outperform traditional large-scale datasets like ImageNet-21k in downstream tasks, highlighting efficient pre-training methods.

Contribution

The authors introduce OFDB, a minimal synthetic dataset with one image per category, showing it can effectively pre-train vision transformers and surpass large datasets in performance.

Findings

01

OFDB outperforms original FDSL datasets in downstream tasks.

02

Pre-training with OFDB matches or exceeds ImageNet-21k performance.

03

Small synthetic datasets can be highly effective for vision transformer pre-training.

Abstract

Formula-driven supervised learning (FDSL) is a pre-training method that relies on synthetic images generated from mathematical formulae such as fractals. Prior work on FDSL has shown that pre-training vision transformers on such synthetic datasets can yield competitive accuracy on a wide range of downstream tasks. These synthetic images are categorized according to the parameters in the mathematical formula that generate them. In the present work, we hypothesize that the process for generating different instances for the same category in FDSL, can be viewed as a form of data augmentation. We validate this hypothesis by replacing the instances with data augmentation, which means we only need a single image per category. Our experiments shows that this one-instance fractal database (OFDB) performs better than the original dataset where instances were explicitly generated. We further scale…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ryoo-nakamura/ofdb
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Fractal and DNA sequence analysis · Neural Networks and Applications