Gaussian Match-and-Copy: A Minimalist Benchmark for Studying Transformer Induction
Antoine Gonon, Alexandre Cordonnier, Nicolas Boumal

TL;DR
This paper introduces Gaussian Match-and-Copy (GMC), a minimalist benchmark to study how transformer models develop retrieval and copy behaviors, disentangling retrieval from memorization, and analyzing optimization dynamics.
Contribution
The paper presents GMC as a new benchmark isolating retrieval in transformers, and provides theoretical analysis of gradient descent dynamics leading to match selection.
Findings
GMC retains key qualitative aspects of transformer match-and-copy behavior.
Different architectures show varying retrieval capabilities on GMC.
Gradient descent implicitly biases solutions towards max-margin match selection.
Abstract
Match-and-copy is a core retrieval primitive used at inference time by large language models to retrieve a matching token from the context then copy its successor. Yet, understanding how this behavior emerges on natural data is challenging because retrieval and memorization are entangled. To disentangle the two, we introduce Gaussian Match-and-Copy (GMC), a minimalist benchmark that isolates long-range retrieval through pure second-order correlation signals. Numerical investigations show that this task retains key qualitative aspects of how Transformers develop match-and-copy circuits in practice, and separates architectures by their retrieval capabilities. We also analyze the optimization dynamics in a simplified attention setting. Although many solutions are a priori possible under a regression objective, including ones that do not implement retrieval, we identify an implicit-bias…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
