Wasserstein Distances, Neuronal Entanglement, and Sparsity
Shashata Sawmya, Linghao Kong, Ilia Markov, Dan Alistarh, Nir Shavit

TL;DR
This paper introduces a novel Wasserstein distance-based measure to analyze neuronal entanglement in large language models, revealing the role of highly entangled neurons in model performance and sparsity.
Contribution
It proposes a new framework for disentangling polysemantic neurons using Wasserstein distances, identifying key entangled neurons affecting accuracy, and analyzing sparsity effects.
Findings
Highly entangled Wasserstein neurons significantly impact model accuracy.
Disentanglement via Wasserstein distances improves understanding of neuron roles.
Sparse expert mixtures effectively maintain accuracy by isolating neuron functions.
Abstract
Disentangling polysemantic neurons is at the core of many current approaches to interpretability of large language models. Here we attempt to study how disentanglement can be used to understand performance, particularly under weight sparsity, a leading post-training optimization technique. We suggest a novel measure for estimating neuronal entanglement: the Wasserstein distance of a neuron's output distribution to a Gaussian. Moreover, we show the existence of a small number of highly entangled "Wasserstein Neurons" in each linear layer of an LLM, characterized by their highly non-Gaussian output distributions, their role in mapping similar inputs to dissimilar outputs, and their significant impact on model accuracy. To study these phenomena, we propose a new experimental framework for disentangling polysemantic neurons. Our framework separates each layer's inputs to create a mixture of…
Peer Reviews
Decision·ICLR 2025 Spotlight
- Understanding the relationship between sparsity and entanglement is an important research area. - The paper conducts extensive experiments to explore the connection between Wasserstein distance, entanglement, and their impact on sparsification.
- The concept of using the Wasserstein distance has been addressed in prior work [1]. The authors could benefit from a broader literature review. It would be helpful to discuss how other works evaluate entanglement. - The sparse expansion approach relies on prior knowledge of the distribution to support K-means clustering. How robust is this method for out-of-distribution data? - Practical aspects of the sparse expansion, such as runtime and inference speed (e.g., with 16 experts), should be dis
I am not fully familiar with the literature on neural disentanglement and the interpretability results. However, I find the paper’s approach to improving the performance of sparsified models by analyzing neuron interpretations interesting. The authors take a step-by-step approach to justify their claims/arguments and to make connections between the different concepts discussed in the paper.
I find that many of the arguments in the paper are either based on intuition or established by showing correlations between metrics. In some cases, it is not very clear to me whether there is some sort of causation also involved. For example, I don't understand why a neuron with a smaller WD to Gaussian should be less entangled (and thus more interpretable?). In the paper, this conclusion relies on the correlation between WD and MD, which is itself an intuitive metric for measuring entanglement
The proposed sparse expansion seems to outperform SparseGPT and all other baselines, though I’m not in this subfield and I don’t know what are the strong and weak baselines here.
- It’s very unclear what do you mean by the gaussian output distribution of a single neuron. For example, the introduction section says that a column of a weight matrix is a neuron. Here it’s unclear what’s the matmul operation between input and weights. Is it right or left multiply? Are you saying that the output scalar $y=\mathbf{w}^\top \mathbf{x}$ is a sample from the gaussian distribution? As a reader I have to read between the lines to figure out what’s the proper definition. Please write
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCellular Mechanics and Interactions
MethodsLLaMA
