Linear Explanations for Individual Neurons

Tuomas Oikarinen; Tsui-Wei Weng

arXiv:2405.06855·cs.LG·May 14, 2024

Linear Explanations for Individual Neurons

Tuomas Oikarinen, Tsui-Wei Weng

PDF

Open Access 1 Repo

TL;DR

This paper introduces a linear explanation method for individual neurons in neural networks, revealing that understanding neurons requires analyzing their entire activation range rather than just their highest activations.

Contribution

The paper proposes a novel linear explanation approach for neurons and introduces an automatic evaluation method using simulation to assess explanation quality.

Findings

01

High activations account for a small causal effect.

02

Lower activations are diverse and not predictable from high activations.

03

Linear explanations effectively capture neuron behavior across activation ranges.

Abstract

In recent years many methods have been developed to understand the internal workings of neural networks, often by describing the function of individual neurons in the model. However, these methods typically only focus on explaining the very highest activations of a neuron. In this paper we show this is not sufficient, and that the highest activation range is only responsible for a very small percentage of the neuron's causal effect. In addition, inputs causing lower activations are often very different and can't be reliably predicted by only looking at high activations. We propose that neurons should instead be understood as a linear combination of concepts, and develop an efficient method for producing these linear explanations. In addition, we show how to automatically evaluate description quality using simulation, i.e. predicting neuron activations on unseen inputs in vision setting.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Trustworthy-ML-Lab/Linear-Explanations
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsFocus