Much Easier Said Than Done: Falsifying the Causal Relevance of Linear   Decoding Methods

Lucas Hayne; Abhijit Suresh; Hunar Jain; Rahul Kumar; R. McKell Carter

arXiv:2211.04367·cs.LG·November 9, 2022

Much Easier Said Than Done: Falsifying the Causal Relevance of Linear Decoding Methods

Lucas Hayne, Abhijit Suresh, Hunar Jain, Rahul Kumar, R. McKell Carter

PDF

Open Access

TL;DR

This paper investigates the reliability of linear decoding methods for understanding neural network function, revealing weak links between interpretability probes and causal importance, and suggesting focus on causally important units for better interpretability.

Contribution

It systematically compares probe-identified units with causally important units via ablation, challenging the effectiveness of linear decoding for neural interpretability.

Findings

01

Weak correlation between probe-identified and ablation-identified units

02

Interaction between selectivity and activity predicts ablation effects

03

Linear decoders partly overlap with causally important units

Abstract

Linear classifier probes are frequently utilized to better understand how neural networks function. Researchers have approached the problem of determining unit importance in neural networks by probing their learned, internal representations. Linear classifier probes identify highly selective units as the most important for network function. Whether or not a network actually relies on high selectivity units can be tested by removing them from the network using ablation. Surprisingly, when highly selective units are ablated they only produce small performance deficits, and even then only in some cases. In spite of the absence of ablation effects for selective neurons, linear decoding methods can be effectively used to interpret network function, leaving their effectiveness a mystery. To falsify the exclusive role of selectivity in network function and resolve this contradiction, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Ferroelectric and Negative Capacitance Devices

MethodsDepthwise Convolution · Batch Normalization · Pointwise Convolution · Depthwise Separable Convolution · 1x1 Convolution · Inverted Residual Block · Convolution · Average Pooling