Transformers Pretrained on Procedural Data Contain Modular Structures for Algorithmic Reasoning

Zachary Shinnick; Liangze Jiang; Hemanth Saratchandran; Anton van den Hengel; Damien Teney

arXiv:2505.22308·cs.LG·May 29, 2025

Transformers Pretrained on Procedural Data Contain Modular Structures for Algorithmic Reasoning

Zachary Shinnick, Liangze Jiang, Hemanth Saratchandran, Anton van den Hengel, Damien Teney

PDF

Open Access

TL;DR

Pretraining small transformer models on synthetic procedural data induces modular, transferable structures that enhance algorithmic reasoning capabilities, revealing how different data rules shape model architecture and function.

Contribution

This paper demonstrates that simple synthetic procedural data can induce distinct, modular inductive structures in transformer models, improving their reasoning skills and enabling disentanglement of knowledge and reasoning.

Findings

01

Procedural rules induce distinct model structures.

02

Attention layers often carry transferable information.

03

Multiple rules can be combined to reinforce capabilities.

Abstract

Pretraining on large, semantically rich datasets is key for developing language models. Surprisingly, recent studies have shown that even synthetic data, generated procedurally through simple semantic-free algorithms, can yield some of the same benefits as natural language pretraining. It is unclear what specific capabilities such simple synthetic data instils in a model, where these capabilities reside in the architecture, and how they manifest within its weights. In this short paper, we identify several beneficial forms of procedural data, together with specific algorithmic reasoning skills that improve in small transformers. Our core finding is that different procedural rules instil distinct but complementary inductive structures in the model. With extensive ablations and partial-transfer experiments, we discover that these structures reside in different parts of the model. Attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications