Sparse Autoencoders are Topic Models

Leander Girrbach; Zeynep Akata

arXiv:2511.16309·cs.CV·May 19, 2026

Sparse Autoencoders are Topic Models

Leander Girrbach, Zeynep Akata

PDF

1 Repo

TL;DR

This paper reinterprets sparse autoencoders as topic models, introduces SAE-TM for thematic analysis, and demonstrates its effectiveness on text and image datasets.

Contribution

It presents a novel theoretical perspective linking SAEs to topic models and develops SAE-TM for improved thematic analysis across modalities.

Findings

01

SAE features are thematic components rather than steerable directions.

02

SAE-TM produces more coherent topics than strong baselines.

03

Thematic structures in images and temporal changes in art are effectively analyzed.

Abstract

Sparse autoencoders (SAEs) are used to analyze embeddings, but their role and practical value are debated. We propose a new perspective on SAEs by demonstrating that they can be naturally understood as topic models. We propose a continuous topic model (CTM) inspired by Latent Dirichlet Allocation (LDA) for embedding spaces and derive the SAE objective as a maximum a posteriori estimator under this model. This view implies SAE features are thematic components rather than steerable directions. To confirm our theoretical findings, we introduce SAE-TM, a topic modeling framework that: (1) trains an SAE to learn reusable topic atoms, (2) interprets them as word distributions on downstream data, and (3) merges them into any number of topics without retraining. SAE-TM yields more coherent topics than strong baselines on text and image datasets while maintaining diversity. Finally, we analyze…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ExplainableML/SAE-TM
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications