Sparse Autoencoders for Hypothesis Generation

Rajiv Movva; Kenny Peng; Nikhil Garg; Jon Kleinberg; Emma Pierson

arXiv:2502.04382·cs.CL·June 10, 2025

Sparse Autoencoders for Hypothesis Generation

Rajiv Movva, Kenny Peng, Nikhil Garg, Jon Kleinberg, Emma Pierson

PDF

Open Access 1 Repo 3 Datasets

TL;DR

HypotheSAEs is a novel method using sparse autoencoders and large language models to generate interpretable hypotheses linking text data to target variables, improving discovery efficiency and interpretability.

Contribution

The paper introduces HypotheSAEs, a new approach combining sparse autoencoders and LLMs for interpretable hypothesis generation from text data.

Findings

01

Outperforms baselines in synthetic datasets (+0.06 F1)

02

Identifies more significant hypotheses on real data (~2x)

03

Requires less compute than recent LLM-based methods

Abstract

We describe HypotheSAEs, a general method to hypothesize interpretable relationships between text data (e.g., headlines) and a target variable (e.g., clicks). HypotheSAEs has three steps: (1) train a sparse autoencoder on text embeddings to produce interpretable features describing the data distribution, (2) select features that predict the target variable, and (3) generate a natural language interpretation of each feature (e.g., "mentions being surprised or shocked") using an LLM. Each interpretation serves as a hypothesis about what predicts the target variable. Compared to baselines, our method better identifies reference hypotheses on synthetic datasets (at least +0.06 in F1) and produces more predictive hypotheses on real datasets (~twice as many significant findings), despite requiring 1-2 orders of magnitude less compute than recent LLM-based methods. HypotheSAEs also produces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rmovva/HypotheSAEs
pytorchOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Physics and Python Applications · Anomaly Detection Techniques and Applications · Machine Learning and Data Classification

MethodsSparse Autoencoder