Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information
Mario Giulianelli, Jacqueline Harding, Florian Mohnert, Dieuwke, Hupkes, Willem Zuidema

TL;DR
This paper uses diagnostic classifiers to analyze how neural language models represent number agreement, revealing when and where errors occur, and demonstrates how this understanding can be used to improve model accuracy.
Contribution
It introduces diagnostic classifiers as a tool to understand and improve how language models track agreement information internally.
Findings
Diagnostic classifiers reveal detailed representations of agreement information.
Interventions based on diagnostic insights significantly improve model accuracy.
Number information is sometimes corrupted in neural language models, leading to errors.
Abstract
How do neural language models keep track of number agreement between subject and verb? We show that `diagnostic classifiers', trained to predict number from the internal states of a language model, provide a detailed understanding of how, when, and where this information is represented. Moreover, they give us insight into when and where number information is corrupted in cases where the language model ends up making agreement errors. To demonstrate the causal role played by the representations we find, we then use agreement information to influence the course of the LSTM during the processing of difficult sentences. Results from such an intervention reveal a large increase in the language model's accuracy. Together, these results show that diagnostic classifiers give us an unrivalled detailed look into the representation of linguistic information in neural models, and demonstrate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
