Interpretability of Language Models via Task Spaces
Lucas Weber, Jaap Jumelet, Elia Bruni, Dieuwke Hupkes

TL;DR
This paper introduces a novel interpretability approach for language models by constructing task spaces based on linguistic phenomena, revealing how models process language and how their generalization improves with size and training.
Contribution
It presents new methods, similarity probing and FTGD, to analyze LM internal representations and their relation to linguistic concepts, offering insights into model generalization and processing.
Findings
Larger models better generalize to overarching linguistic concepts.
Pre-training increases distributedness of linguistic processing.
Generalization patterns remain stable throughout training.
Abstract
The usual way to interpret language models (LMs) is to test their performance on different benchmarks and subsequently infer their internal processes. In this paper, we present an alternative approach, concentrating on the quality of LM processing, with a focus on their language abilities. To this end, we construct 'linguistic task spaces' -- representations of an LM's language conceptualisation -- that shed light on the connections LMs draw between language phenomena. Task spaces are based on the interactions of the learning signals from different linguistic phenomena, which we assess via a method we call 'similarity probing'. To disentangle the learning signals of linguistic phenomena, we further introduce a method called 'fine-tuning via gradient differentials' (FTGD). We apply our methods to language models of three different scales and find that larger models generalise better to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsFocus
