Probing the Representational Power of Sparse Autoencoders in Vision Models

Matthew Lyle Olson; Musashi Hinck; Neale Ratzlaff; Changbai Li; Phillip Howard; Vasudev Lal; Shao-Yen Tseng

arXiv:2508.11277·cs.CV·September 19, 2025

Probing the Representational Power of Sparse Autoencoders in Vision Models

Matthew Lyle Olson, Musashi Hinck, Neale Ratzlaff, Changbai Li, Phillip Howard, Vasudev Lal, Shao-Yen Tseng

PDF

TL;DR

This paper evaluates the use of Sparse Autoencoders in vision models, showing they produce meaningful features that enhance interpretability, generalization, and controllability across various vision architectures.

Contribution

It provides the first extensive evaluation of SAE's representational power in vision models, demonstrating their benefits for interpretability and model control.

Findings

01

SAE features are semantically meaningful in vision models.

02

SAEs improve out-of-distribution detection and ontological structure recovery.

03

SAEs enable semantic steering and reveal shared representations across modalities.

Abstract

Sparse Autoencoders (SAEs) have emerged as a popular tool for interpreting the hidden states of large language models (LLMs). By learning to reconstruct activations from a sparse bottleneck layer, SAEs discover interpretable features from the high-dimensional internal representations of LLMs. Despite their popularity with language models, SAEs remain understudied in the visual domain. In this work, we provide an extensive evaluation the representational power of SAEs for vision models using a broad range of image-based tasks. Our experimental results demonstrate that SAE features are semantically meaningful, improve out-of-distribution generalization, and enable controllable generation across three vision model architectures: vision embedding models, multi-modal LMMs and diffusion models. In vision embedding models, we find that learned SAE features can be used for OOD detection and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.