Measuring the Representational Alignment of Neural Systems in Superposition
Sunny Liu, Habon Issa, Andr\'e Longon, Liv Gorton, Meenakshi Khosla, David Klindt

TL;DR
This paper reveals that common metrics for comparing neural representations are biased by superposition effects, and proposes focusing on underlying features for accurate alignment.
Contribution
It derives analytical expressions showing how superposition distorts similarity metrics and suggests feature-based alignment as a solution.
Findings
Superposition systematically deflates similarity metrics like RSA and CKA.
Alignment scores can be misleading under superposition, sometimes inversely related to shared features.
Sparse feature recovery remains possible despite superposition, enabling better comparison methods.
Abstract
Comparing the internal representations of neural networks is a central goal in both neuroscience and machine learning. Standard alignment metrics operate on raw neural activations, implicitly assuming that similar representations produce similar activity patterns. However, neural systems frequently operate in superposition, encoding more features than they have neurons via linear compression. We derive closed-form expressions showing that superposition systematically deflates Representational Similarity Analysis, Centered Kernel Alignment, and linear regression, causing networks with identical feature content to appear dissimilar. The root cause is that these metrics are dependent on cross-similarity between two systems' respective superposition matrices, which under assumption of random projection usually differ significantly, not on the latent features themselves: alignment scores…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
