TL;DR
This paper reinterprets sparse autoencoders as topic models, introduces SAE-TM for thematic analysis, and demonstrates its effectiveness on text and image datasets.
Contribution
It presents a novel theoretical perspective linking SAEs to topic models and develops SAE-TM for improved thematic analysis across modalities.
Findings
SAE features are thematic components rather than steerable directions.
SAE-TM produces more coherent topics than strong baselines.
Thematic structures in images and temporal changes in art are effectively analyzed.
Abstract
Sparse autoencoders (SAEs) are used to analyze embeddings, but their role and practical value are debated. We propose a new perspective on SAEs by demonstrating that they can be naturally understood as topic models. We propose a continuous topic model (CTM) inspired by Latent Dirichlet Allocation (LDA) for embedding spaces and derive the SAE objective as a maximum a posteriori estimator under this model. This view implies SAE features are thematic components rather than steerable directions. To confirm our theoretical findings, we introduce SAE-TM, a topic modeling framework that: (1) trains an SAE to learn reusable topic atoms, (2) interprets them as word distributions on downstream data, and (3) merges them into any number of topics without retraining. SAE-TM yields more coherent topics than strong baselines on text and image datasets while maintaining diversity. Finally, we analyze…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications
