SynthSAEBench: Evaluating Sparse Autoencoders on Scalable Realistic Synthetic Data

David Chanin; Adri\`a Garriga-Alonso

arXiv:2602.14687·cs.LG·February 17, 2026

SynthSAEBench: Evaluating Sparse Autoencoders on Scalable Realistic Synthetic Data

David Chanin, Adri\`a Garriga-Alonso

PDF

Open Access 1 Models

TL;DR

SynthSAEBench is a comprehensive benchmark toolkit for evaluating Sparse Autoencoders on large-scale, realistic synthetic data, enabling precise diagnosis of architectural improvements and failure modes.

Contribution

We introduce SynthSAEBench, a scalable synthetic data benchmark with a standardized model for detailed SAE evaluation and analysis.

Findings

01

Reproduces known LLM SAE phenomena

02

Identifies superposition noise exploitation as a failure mode

03

Highlights overfitting risks with more expressive encoders

Abstract

Improving Sparse Autoencoders (SAEs) requires benchmarks that can precisely validate architectural innovations. However, current SAE benchmarks on LLMs are often too noisy to differentiate architectural improvements, and current synthetic data experiments are too small-scale and unrealistic to provide meaningful comparisons. We introduce SynthSAEBench, a toolkit for generating large-scale synthetic data with realistic feature characteristics including correlation, hierarchy, and superposition, and a standardized benchmark model, SynthSAEBench-16k, enabling direct comparison of SAE architectures. Our benchmark reproduces several previously observed LLM SAE phenomena, including the disconnect between reconstruction and latent quality metrics, poor SAE probing results, and a precision-recall trade-off mediated by L0. We further use our benchmark to identify a new failure mode: Matching…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
decoderesearch/synth-sae-bench-16k-v1-saes
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning