Making Sense Of Distributed Representations With Activation Spectroscopy
Kyle Reing, Greg Ver Steeg, Aram Galstyan

TL;DR
This paper introduces Activation Spectroscopy, a method to interpret distributed neural network representations by analyzing the Fourier spectrum of activation patterns, enabling detection and tracing of neuron influences in complex models.
Contribution
It proposes a novel Fourier-based approach and a combinatorial optimization procedure to identify influential neuron subsets, advancing interpretability of distributed representations.
Findings
Effective in synthetic settings for detecting distributed features
Outperforms existing interpretability benchmarks
Provides insights into neuron subset influences in real models
Abstract
In the study of neural network interpretability, there is growing evidence to suggest that relevant features are encoded across many neurons in a distributed fashion. Making sense of these distributed representations without knowledge of the network's encoding strategy is a combinatorial task that is not guaranteed to be tractable. This work explores one feasible path to both detecting and tracing the joint influence of neurons in a distributed representation. We term this approach Activation Spectroscopy (ActSpec), owing to its analysis of the pseudo-Boolean Fourier spectrum defined over the activation patterns of a network layer. The sub-network defined between a given layer and an output logit is cast as a special class of pseudo-Boolean function. The contributions of each subset of neurons in the specified layer can be quantified through the function's Fourier coefficients. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
