Do Sparse Autoencoders Capture Concept Manifolds?

Usha Bhalla; Thomas Fel; Can Rager; Sheridan Feucht; Tal Haklay; Daniel Wurgaft; Siddharth Boppana; Matthew Kowal; Vasudev Shyam; Jack Merullo; Atticus Geiger; Ekdeep Singh Lubana

arXiv:2604.28119·cs.LG·May 1, 2026

Do Sparse Autoencoders Capture Concept Manifolds?

Usha Bhalla, Thomas Fel, Can Rager, Sheridan Feucht, Tal Haklay, Daniel Wurgaft, Siddharth Boppana, Matthew Kowal, Vasudev Shyam, Jack Merullo, Atticus Geiger, Ekdeep Singh Lubana

PDF

1 Repo

TL;DR

This paper investigates how sparse autoencoders (SAEs) capture concept manifolds, providing a theoretical framework and empirical analysis showing they do so in global or local ways, with implications for interpretability.

Contribution

It introduces a theoretical framework for understanding manifold capture by SAEs and reveals their limitations and regimes of operation, guiding future interpretability methods.

Findings

01

SAEs can capture manifolds globally or locally.

02

SAEs often mix global and local solutions in a fragmented regime.

03

Manifold structure is rarely visible at the level of individual concepts.

Abstract

Sparse autoencoders (SAEs) are widely used to extract interpretable features from neural network representations, often under the implicit assumption that concepts correspond to independent linear directions. However, a growing body of evidence suggests that many concepts are instead organized along low-dimensional manifolds encoding continuous geometric relationships. This raises three basic questions: what does it mean for an SAE to capture a manifold, when do existing SAE architectures do so, and how? We develop a theoretical framework that answers these questions and show that SAEs can capture manifolds in two fundamentally different ways: globally, by allocating a compact group of atoms whose linear span contains the entire manifold, or locally, by distributing it across features that each selectively tile a restricted region of the underlying geometry. Empirically, we find that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

goodfire-ai/sae-manifold
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.