Scaling Backwards: Minimal Synthetic Pre-training?
Ryo Nakamura, Ryu Tadokoro, Ryosuke Yamada, Yuki M. Asano, Iro Laina,, Christian Rupprecht, Nakamasa Inoue, Rio Yokota, Hirokatsu Kataoka

TL;DR
This paper demonstrates that minimal synthetic pre-training datasets, constructed from simple fractals, can achieve comparable performance to large real-world datasets like ImageNet-1k, challenging the necessity of extensive pre-training data.
Contribution
It introduces a minimal synthetic dataset from a single fractal, showing effective pre-training with drastically fewer images and exploring the importance of shape differences and minimal data for transfer learning.
Findings
Pre-training with minimal synthetic images matches ImageNet-1k performance.
Shape differences are crucial for effective pre-training.
Reducing synthetic images from 1k to 1 can improve performance.
Abstract
Pre-training and transfer learning are an important building block of current computer vision systems. While pre-training is usually performed on large real-world image datasets, in this paper we ask whether this is truly necessary. To this end, we search for a minimal, purely synthetic pre-training dataset that allows us to achieve performance similar to the 1 million images of ImageNet-1k. We construct such a dataset from a single fractal with perturbations. With this, we contribute three main findings. (i) We show that pre-training is effective even with minimal synthetic images, with performance on par with large-scale pre-training datasets like ImageNet-1k for full fine-tuning. (ii) We investigate the single parameter with which we construct artificial categories for our dataset. We find that while the shape differences can be indistinguishable to humans, they are crucial for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSports and Physical Education Research
