How Do Transformers Learn Variable Binding in Symbolic Programs?

Yiwei Wu; Atticus Geiger; Rapha\"el Milli\`ere

arXiv:2505.20896·cs.LG·June 3, 2025

How Do Transformers Learn Variable Binding in Symbolic Programs?

Yiwei Wu, Atticus Geiger, Rapha\"el Milli\`ere

PDF

Open Access

TL;DR

This paper demonstrates how Transformer models can learn to perform variable binding and dereferencing in symbolic programs through training, developing a systematic mechanism that mimics symbolic reasoning without explicit architectural features.

Contribution

It reveals the developmental stages and mechanisms by which Transformers acquire variable binding capabilities, including the use of residual streams as addressable memory.

Findings

01

Transformers develop a systematic dereferencing mechanism during training.

02

Attention heads learn to route information across token positions.

03

The model can dynamically track variable bindings across layers.

Abstract

Variable binding -- the ability to associate variables with values -- is fundamental to symbolic computation and cognition. Although classical architectures typically implement variable binding via addressable memory, it is not well understood how modern neural networks lacking built-in binding operations may acquire this capacity. We investigate this by training a Transformer to dereference queried variables in symbolic programs where variables are assigned either numerical constants or other variables. Each program requires following chains of variable assignments up to four steps deep to find the queried value, and also contains irrelevant chains of assignments acting as distractors. Our analysis reveals a developmental trajectory with three distinct phases during training: (1) random prediction of numerical constants, (2) a shallow heuristic prioritizing early variable assignments,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEvolutionary Algorithms and Applications · Metaheuristic Optimization Algorithms Research · Artificial Intelligence in Games