Can You Learn to See Without Images? Procedural Warm-Up for Vision Transformers

Zachary Shinnick; Liangze Jiang; Hemanth Saratchandran; Damien Teney; Anton van den Hengel

arXiv:2511.13945·cs.CV·March 24, 2026

Can You Learn to See Without Images? Procedural Warm-Up for Vision Transformers

Zachary Shinnick, Liangze Jiang, Hemanth Saratchandran, Damien Teney, Anton van den Hengel

PDF

Open Access 1 Models

TL;DR

This paper introduces a pretraining method for vision transformers using procedurally generated data without visual content, which enhances data efficiency and performance on image classification tasks.

Contribution

The authors propose a novel warm-up pretraining approach with procedural data that improves vision transformers' data efficiency and downstream accuracy.

Findings

01

Procedural pretraining improves ImageNet accuracy by over 1.7% with only 1% of data.

02

Procedural data pretraining is equivalent to using 28% of ImageNet data.

03

The method enhances convergence speed and data efficiency of vision transformers.

Abstract

Transformers are remarkably versatile, suggesting the existence of generic inductive biases beneficial across modalities. In this work, we explore a new way to instil such biases in vision transformers (ViTs) through pretraining on procedurally generated data devoid of visual or semantic content. We generate this data with simple algorithms such as formal grammars, so the results bear no relationship to either natural or synthetic images. We use this procedurally generated data to pretrain ViTs in a warm-up phase that bypasses their visual patch embedding mechanisms, thus encouraging the models to internalise abstract computational priors. When followed by standard image-based training, this warm-up significantly improves data efficiency, convergence speed, and downstream performance. On ImageNet-1K, for example, allocating just 1% of the training budget to procedural data improves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
zlshinnick/procedural-warmup-vit
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications