Start Making Sense(s): A Developmental Probe of Attention Specialization Using Lexical Ambiguity
Pamela D. Rivi\`ere, Sean Trott

TL;DR
This paper systematically probes attention mechanisms in Transformer language models to understand how they develop specialized functions for word sense disambiguation, revealing developmental differences and robustness in attention heads.
Contribution
It introduces a pipeline for analyzing attention mechanisms, demonstrating developmental stages and robustness of heads involved in disambiguation in different model sizes.
Findings
Larger models have more robust disambiguation heads.
Attention heads' disambiguation behavior varies with development stage.
Ablation of key heads impairs disambiguation performance.
Abstract
Despite an in-principle understanding of self-attention matrix operations in Transformer language models (LMs), it remains unclear precisely how these operations map onto interpretable computations or functions--and how or when individual attention heads develop specialized attention patterns. Here, we present a pipeline to systematically probe attention mechanisms, and we illustrate its value by leveraging lexical ambiguity--where a single word has multiple meanings--to isolate attention mechanisms that contribute to word sense disambiguation. We take a "developmental" approach: first, using publicly available Pythia LM checkpoints, we identify inflection points in disambiguation performance for each LM in the suite; in 14M and 410M, we identify heads whose attention to disambiguating words covaries with overall disambiguation performance across development. We then stress-test the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeurobiology of Language and Bilingualism · Topic Modeling · Ferroelectric and Negative Capacitance Devices
