Interpretability of Language Models via Task Spaces

Lucas Weber; Jaap Jumelet; Elia Bruni; Dieuwke Hupkes

arXiv:2406.06441·cs.CL·June 11, 2024

Interpretability of Language Models via Task Spaces

Lucas Weber, Jaap Jumelet, Elia Bruni, Dieuwke Hupkes

PDF

Open Access

TL;DR

This paper introduces a novel interpretability approach for language models by constructing task spaces based on linguistic phenomena, revealing how models process language and how their generalization improves with size and training.

Contribution

It presents new methods, similarity probing and FTGD, to analyze LM internal representations and their relation to linguistic concepts, offering insights into model generalization and processing.

Findings

01

Larger models better generalize to overarching linguistic concepts.

02

Pre-training increases distributedness of linguistic processing.

03

Generalization patterns remain stable throughout training.

Abstract

The usual way to interpret language models (LMs) is to test their performance on different benchmarks and subsequently infer their internal processes. In this paper, we present an alternative approach, concentrating on the quality of LM processing, with a focus on their language abilities. To this end, we construct 'linguistic task spaces' -- representations of an LM's language conceptualisation -- that shed light on the connections LMs draw between language phenomena. Task spaces are based on the interactions of the learning signals from different linguistic phenomena, which we assess via a method we call 'similarity probing'. To disentangle the learning signals of linguistic phenomena, we further introduce a method called 'fine-tuning via gradient differentials' (FTGD). We apply our methods to language models of three different scales and find that larger models generalise better to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsFocus