Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient
George Wang, Jesse Hoogland, Stan van Wingerden, Zach Furman, Daniel, Murfet

TL;DR
This paper introduces refined Local Learning Coefficients to analyze transformer models, revealing how attention heads differentiate and specialize during training, and uncovering new functional circuits.
Contribution
It develops refined LLCs for better interpretability of model development, providing insights into attention head differentiation and emergent structures in transformers.
Findings
Attention heads differentiate into distinct roles during training
Identifies a new multigram circuit in the model
Provides a quantitative toolkit for developmental interpretability
Abstract
We introduce refined variants of the Local Learning Coefficient (LLC), a measure of model complexity grounded in singular learning theory, to study the development of internal structure in transformer language models during training. By applying these \textit{refined LLCs} (rLLCs) to individual components of a two-layer attention-only transformer, we gain novel insights into the progressive differentiation and specialization of attention heads. Our methodology reveals how attention heads differentiate into distinct functional roles over the course of training, analyzes the types of data these heads specialize to process, and discovers a previously unidentified multigram circuit. These findings demonstrate that rLLCs provide a principled, quantitative toolkit for \textit{developmental interpretability}, which aims to understand models through their evolution across the learning process.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications · CCD and CMOS Imaging Sensors
MethodsSoftmax · Attention Is All You Need
