On the Pitfalls of Analyzing Individual Neurons in Language Models

Omer Antverg; Yonatan Belinkov

arXiv:2110.07483·cs.CL·August 2, 2022

On the Pitfalls of Analyzing Individual Neurons in Language Models

Omer Antverg, Yonatan Belinkov

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper critically examines methods for analyzing individual neurons in language models, highlighting methodological pitfalls and distinguishing between encoded and utilized information, with implications for interpretability research.

Contribution

It identifies two key pitfalls in neuron analysis methods and proposes a simple alternative, improving the understanding of how linguistic information is represented in models.

Findings

01

Separates probe quality from ranking quality in neuron analysis

02

Shows that encoded information differs from information used by the model

03

Evaluates ranking methods with respect to both factors

Abstract

While many studies have shown that linguistic information is encoded in hidden word representations, few have studied individual neurons, to show how and in which neurons it is encoded. Among these, the common approach is to use an external probe to rank neurons according to their relevance to some linguistic attribute, and to evaluate the obtained ranking using the same probe that produced it. We show two pitfalls in this methodology: 1. It confounds distinct factors: probe quality and ranking quality. We separate them and draw conclusions on each. 2. It focuses on encoded information, rather than information that is used by the model. We show that these are not the same. We compare two recent ranking methods and a simple one we introduce, and evaluate them with regard to both of these aspects.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

On the Pitfalls of Analyzing Individual Neurons in Language Models· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications