Superposition as Lossy Compression: Measure with Sparse Autoencoders and Connect to Adversarial Vulnerability

Leonard Bereska; Zoe Tzifa-Kratira; Reza Samavi; Efstratios Gavves

arXiv:2512.13568·cs.LG·December 16, 2025

Superposition as Lossy Compression: Measure with Sparse Autoencoders and Connect to Adversarial Vulnerability

Leonard Bereska, Zoe Tzifa-Kratira, Reza Samavi, Efstratios Gavves

PDF

Open Access

TL;DR

This paper introduces an information-theoretic measure using sparse autoencoders to quantify superposition in neural networks, revealing its relationship with adversarial vulnerability and network capacity.

Contribution

It presents a novel metric for measuring superposition as lossy compression, linking it to network robustness and capacity, and challenges existing assumptions about adversarial vulnerability.

Findings

01

The metric correlates well with ground truth in toy models.

02

Superposition reduces under dropout and during feature consolidation.

03

Adversarial training can increase effective features and robustness.

Abstract

Neural networks achieve remarkable performance through superposition: encoding multiple features as overlapping directions in activation space rather than dedicating individual neurons to each feature. This challenges interpretability, yet we lack principled methods to measure superposition. We present an information-theoretic framework measuring a neural representation's effective degrees of freedom. We apply Shannon entropy to sparse autoencoder activations to compute the number of effective features as the minimum neurons needed for interference-free encoding. Equivalently, this measures how many "virtual neurons" the network simulates through superposition. When networks encode more effective features than actual neurons, they must accept interference as the price of compression. Our metric strongly correlates with ground truth in toy models, detects minimal superposition in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Explainable Artificial Intelligence (XAI)