TL;DR
This paper investigates how transformers learn semantic associations like 'bird' and 'flew' by analyzing training dynamics and deriving closed-form expressions for weights that explain the emergence of these associations.
Contribution
It introduces a leading-term gradient approximation to explain how semantic associations develop in transformers and provides a mechanistic, interpretable model of this process.
Findings
Closed-form expressions for transformer weights at early training stages.
Weights are compositions of bigram, token-interchangeability, and context functions.
Theoretical weight models closely match those learned by real-world LLMs.
Abstract
Semantic associations such as the link between "bird" and "flew" are foundational for language modeling as they enable models to go beyond memorization and instead generalize and generate coherent text. Understanding how these associations are learned and represented in language models is essential for connecting deep learning with linguistic theory and developing a mechanistic foundation for large language models. In this work, we analyze how these associations emerge from natural language data in attention-based language models through the lens of training dynamics. By leveraging a leading-term approximation of the gradients, we develop closed-form expressions for the weights at early stages of training that explain how semantic associations first take shape. Through our analysis, we reveal that each set of weights of the transformer has closed-form expressions as simple compositions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
