Transformers Can Learn Rules They've Never Seen: Proof of Computation Beyond Interpolation
Andy Gray

TL;DR
This paper demonstrates that transformers can learn and explicitly represent rules they have never seen during training, challenging the idea that their generalization is solely based on similarity-based interpolation.
Contribution
The study provides controlled experiments showing transformers can infer unseen rules and produce symbolic derivations, proving their capacity for rule learning beyond interpolation.
Findings
Transformers successfully learned XOR rules despite local pattern removal.
Transformers outperformed interpolation baselines in symbolic operator chain tasks.
Explicit intermediate steps improve transformer performance on rule inference.
Abstract
A central question in the LLM debate is whether transformers can infer rules absent from training, or whether apparent generalisation reduces to similarity-based interpolation over observed examples. We test a strong interpolation-only hypothesis in two controlled settings: one where interpolation is ruled out by construction and proof, and one where success requires emitting intermediate symbolic derivations rather than only final answers. In Experiment 1, we use a cellular automaton with a pure XOR transition rule and remove specific local input patterns from training; since XOR is linearly inseparable, each held-out pattern's nearest neighbours have the opposite label, so similarity-based predictors fail on the held-out region. Yet a two-layer transformer recovers the rule (best 100%; 47/60 converged runs), and circuit extraction identifies XOR computation. Performance depends on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Cellular Automata and Applications · Machine Learning and Algorithms
