Can You Learn to See Without Images? Procedural Warm-Up for Vision Transformers
Zachary Shinnick, Liangze Jiang, Hemanth Saratchandran, Damien Teney, Anton van den Hengel

TL;DR
This paper introduces a pretraining method for vision transformers using procedurally generated data without visual content, which enhances data efficiency and performance on image classification tasks.
Contribution
The authors propose a novel warm-up pretraining approach with procedural data that improves vision transformers' data efficiency and downstream accuracy.
Findings
Procedural pretraining improves ImageNet accuracy by over 1.7% with only 1% of data.
Procedural data pretraining is equivalent to using 28% of ImageNet data.
The method enhances convergence speed and data efficiency of vision transformers.
Abstract
Transformers are remarkably versatile, suggesting the existence of generic inductive biases beneficial across modalities. In this work, we explore a new way to instil such biases in vision transformers (ViTs) through pretraining on procedurally generated data devoid of visual or semantic content. We generate this data with simple algorithms such as formal grammars, so the results bear no relationship to either natural or synthetic images. We use this procedurally generated data to pretrain ViTs in a warm-up phase that bypasses their visual patch embedding mechanisms, thus encouraging the models to internalise abstract computational priors. When followed by standard image-based training, this warm-up significantly improves data efficiency, convergence speed, and downstream performance. On ImageNet-1K, for example, allocating just 1% of the training budget to procedural data improves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
