Making Sense Of Distributed Representations With Activation Spectroscopy

Kyle Reing; Greg Ver Steeg; Aram Galstyan

arXiv:2501.15435·cs.LG·January 28, 2025

Making Sense Of Distributed Representations With Activation Spectroscopy

Kyle Reing, Greg Ver Steeg, Aram Galstyan

PDF

Open Access

TL;DR

This paper introduces Activation Spectroscopy, a method to interpret distributed neural network representations by analyzing the Fourier spectrum of activation patterns, enabling detection and tracing of neuron influences in complex models.

Contribution

It proposes a novel Fourier-based approach and a combinatorial optimization procedure to identify influential neuron subsets, advancing interpretability of distributed representations.

Findings

01

Effective in synthetic settings for detecting distributed features

02

Outperforms existing interpretability benchmarks

03

Provides insights into neuron subset influences in real models

Abstract

In the study of neural network interpretability, there is growing evidence to suggest that relevant features are encoded across many neurons in a distributed fashion. Making sense of these distributed representations without knowledge of the network's encoding strategy is a combinatorial task that is not guaranteed to be tractable. This work explores one feasible path to both detecting and tracing the joint influence of neurons in a distributed representation. We term this approach Activation Spectroscopy (ActSpec), owing to its analysis of the pseudo-Boolean Fourier spectrum defined over the activation patterns of a network layer. The sub-network defined between a given layer and an output logit is cast as a special class of pseudo-Boolean function. The contributions of each subset of neurons in the specified layer can be quantified through the function's Fourier coefficients. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications