Disaggregation Reveals Hidden Training Dynamics: The Case of Agreement Attraction
James A. Michaelov, Catherine Arnett

TL;DR
This paper uses disaggregation techniques inspired by psycholinguistics to analyze training dynamics in language models, revealing how they develop grammatical understanding and heuristics over different training phases.
Contribution
It introduces a fine-grained analysis method that disaggregates training data conditions to uncover intermediate learning stages and heuristics in language models.
Findings
Models initially rely on word frequency heuristics.
Training phases show a shift towards generalized grammatical rules.
Disaggregation reveals distinct learning stages.
Abstract
Language models generally produce grammatical text, but they are more likely to make errors in certain contexts. Drawing on paradigms from psycholinguistics, we carry out a fine-grained analysis of those errors in different syntactic contexts. We demonstrate that by disaggregating over the conditions of carefully constructed datasets and comparing model performance on each over the course of training, it is possible to better understand the intermediate stages of grammatical learning in language models. Specifically, we identify distinct phases of training where language model behavior aligns with specific heuristics such as word frequency and local context rather than generalized grammatical rules. We argue that taking this approach to analyzing language model behavior more generally can serve as a powerful tool for understanding the intermediate learning phases, overall training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
