Universal pre-training by iterated random computation

Peter Bloem

arXiv:2506.20057·cs.LG·June 26, 2025

Universal pre-training by iterated random computation

Peter Bloem

PDF

Open Access 1 Repo

TL;DR

This paper explores using randomly generated data for pre-training models, providing theoretical justification and empirical evidence that such pre-training enhances zero-shot learning, especially when combined with fine-tuning.

Contribution

It introduces a theoretical framework for pre-training with synthetic data and demonstrates its effectiveness in improving model performance and convergence.

Findings

01

Synthetic pre-training enables zero-shot in-context learning.

02

Model performance improves with scale of synthetic data.

03

Fine-tuning after synthetic pre-training leads to faster convergence.

Abstract

We investigate the use of randomly generated data for the sake of pre-training a model. We justify this approach theoretically from the perspective of algorithmic complexity, building on recent research that shows that sequence models can be trained to approximate Solomonoff induction. We derive similar, but complementary theoretical results. We show empirically that synthetically generated data can be used to pre-train a model before the data is seen. We replicate earlier results that models trained this way show zero-shot in-context learning across a variety of datasets, and that this performance improves with scale. We extend earlier results to real-world data, and show that finetuning a model after pre-training offers faster convergence and better generalization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pbloem/up
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Optimization and Search Problems