Feature Interactions Reveal Linguistic Structure in Language Models

Jaap Jumelet; Willem Zuidema

arXiv:2306.12181·cs.CL·June 22, 2023·1 cites

Feature Interactions Reveal Linguistic Structure in Language Models

Jaap Jumelet, Willem Zuidema

PDF

Open Access 1 Repo

TL;DR

This paper investigates feature interactions in language models to understand their role in capturing linguistic structure, evaluating various attribution methods through formal language tasks and real-world case studies.

Contribution

It introduces a grey box methodology using PCFGs to assess feature interaction attribution methods and demonstrates their effectiveness in revealing linguistic structures in language models.

Findings

01

Some attribution methods can uncover grammatical rules in formal language tasks.

02

Evaluation on language models provides new insights into their acquired linguistic structures.

03

Certain configurations improve the faithfulness of interaction attributions.

Abstract

We study feature interactions in the context of feature attribution methods for post-hoc interpretability. In interpretability research, getting to grips with feature interactions is increasingly recognised as an important challenge, because interacting features are key to the success of neural networks. Feature interactions allow a model to build up hierarchical representations for its input, and might provide an ideal starting point for the investigation into linguistic structure in language models. However, uncovering the exact role that these interactions play is also difficult, and a diverse range of interaction attribution methods has been proposed. In this paper, we focus on the question which of these methods most faithfully reflects the inner workings of the target models. We work out a grey box methodology, in which we train models to perfection on a formal language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jumelet/fidam-eval
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)

MethodsFocus