From Data Statistics to Feature Geometry: How Correlations Shape Superposition

Lucas Prieto; Edward Stevinson; Melih Barsbey; Tolga Birdal; Pedro A.M. Mediano

arXiv:2603.09972·cs.LG·March 11, 2026

From Data Statistics to Feature Geometry: How Correlations Shape Superposition

Lucas Prieto, Edward Stevinson, Melih Barsbey, Tolga Birdal, Pedro A.M. Mediano

PDF

Open Access 3 Reviews

TL;DR

This paper introduces BOWS, a controlled setting demonstrating that feature correlations in neural networks can lead to constructive interference, revealing new geometric structures and semantic clustering in language models.

Contribution

It extends the understanding of superposition by showing how correlated features can produce constructive interference and semantic structures, unlike the idealized uncorrelated case.

Findings

01

Correlated features can cause constructive interference in superposition.

02

Semantic clusters and cyclical structures emerge in language models.

03

Weight decay influences feature arrangements and correlations.

Abstract

A central idea in mechanistic interpretability is that neural networks represent more features than they have dimensions, arranging them in superposition to form an over-complete basis. This framing has been influential, motivating dictionary learning approaches such as sparse autoencoders. However, superposition has mostly been studied in idealized settings where features are sparse and uncorrelated. In these settings, superposition is typically understood as introducing interference that must be minimized geometrically and filtered out by non-linearities such as ReLUs, yielding local structures like regular polytopes. We show that this account is incomplete for realistic data by introducing Bag-of-Words Superposition (BOWS), a controlled setting to encode binary bag-of-words representations of internet text in superposition. Using BOWS, we find that when features are correlated,…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 8Confidence 3

Strengths

1, The paper's primary strength is its clarity. The distinction between "linear superposition" (PCA on correlated data) and "non-linear superposition" (local, ReLU-dependent structures for uncorrelated data) is a very useful and clear conceptual framework. And the flow of the paper is natural, too. 2, The BOWS framework offers a good trade-off, it's more realistic than toy models but far more controllable than a full LLM. Using it to show the emergence of semantic clusters and circles from data

Weaknesses

1, The main limitation is the simplicity of the BOWS setup. An AE trained on static BoW vectors does not necessarily capture all the SAE variants that exist in literature today, since many of them explicitly make architectural changes to cater to the space where features live. 2, Also, the paper seems to strongly implies that these structures are just byproducts and not functional. This dichotomy might be false. A model could (and likely would) exploit this emergent, PCA-driven structure for co

Reviewer 02Rating 2Confidence 3

Strengths

The observation that linear superposition is more common (if correct), seems important. The BOWS setup seems like a nice, sensible approach to introducing realistic correlation structure, and has the bonus of enabling researchers to bring their knowledge of words’ semantics to the analysis of experiments.

Weaknesses

(major): I believe this work is in need of more rigor in establishing its central definitions and concepts (e.g. “superposition”, “linear superposition”). For instance, I’m not sure what is the content of the statement “This explicitly shows that linear dimensionality reduction enables a form of superposition (d = 12 > m = 2) by exploiting feature correlations, without requiring any non-linearity.” This just sounds like it’s saying “you can reconstruct an input well using a few principle compo

Reviewer 03Rating 2Confidence 3

Strengths

- The paper provides convincing evidence that complex structures of features can emerge purely from the task of compressing features for storage, when there are correlations in the feature distribution. - Figure 6 provides convincing evidence that complex structures of features form and then disappear as the latent size changes, when training a network to compress information.

Weaknesses

The paper is unnecessarily complex and convoluted for it's goal in helping interpretability efforts. - BOWS is much less a "framework" than an application of a pre-existing technique for demonstrating superposition, in Elhage (2022) [1], but on a custom bag-of-words dataset. Most of the "framework" parts of the work (techniques for analysis of results, PCA, visualization, training methodology) are inherited from Elhage (2022). - The framework's goal is to show that compression alone can be respo

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis · Advanced Graph Neural Networks