Data Whitening Improves Sparse Autoencoder Learning

Ashwin Saraswatula; David Klindt

arXiv:2511.13981·cs.LG·November 19, 2025

Data Whitening Improves Sparse Autoencoder Learning

Ashwin Saraswatula, David Klindt

PDF

Open Access 1 Video

TL;DR

Applying PCA whitening to input activations significantly enhances the interpretability and optimization efficiency of sparse autoencoders across various architectures and metrics, advocating for its standard use in SAE training.

Contribution

This work demonstrates that PCA whitening improves SAE performance and interpretability by transforming the optimization landscape, supported by theoretical analysis and extensive empirical evaluation.

Findings

01

Whitening improves interpretability metrics like sparse probing accuracy.

02

Whitening makes the optimization landscape more convex and easier to optimize.

03

Minor drops in reconstruction quality occur with whitening, but interpretability benefits outweigh these.

Abstract

Sparse autoencoders (SAEs) have emerged as a promising approach for learning interpretable features from neural network activations. However, the optimization landscape for SAE training can be challenging due to correlations in the input data. We demonstrate that applying PCA Whitening to input activations -- a standard preprocessing technique in classical sparse coding -- improves SAE performance across multiple metrics. Through theoretical analysis and simulation, we show that whitening transforms the optimization landscape, making it more convex and easier to navigate. We evaluate both ReLU and Top-K SAEs across diverse model architectures, widths, and sparsity regimes. Empirical evaluation on SAEBench, a comprehensive benchmark for sparse autoencoders, reveals that whitening consistently improves interpretability metrics, including sparse probing accuracy and feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Data Whitening Improves Sparse Autoencoder Learning· underline

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning