SoftSAE: Dynamic Top-K Selection for Adaptive Sparse Autoencoders

Jakub St\k{e}pie\'n; Marcin Mazur; Jacek Tabor; Przemys{\l}aw Spurek

arXiv:2605.06610·cs.LG·May 11, 2026

SoftSAE: Dynamic Top-K Selection for Adaptive Sparse Autoencoders

Jakub St\k{e}pie\'n, Marcin Mazur, Jacek Tabor, Przemys{\l}aw Spurek

PDF

1 Repo

TL;DR

SoftSAE introduces a differentiable, input-dependent Top-K mechanism for sparse autoencoders, enabling adaptive feature selection that improves interpretability and representation quality in neural networks.

Contribution

It proposes a novel Soft Top-K operator allowing autoencoders to dynamically adjust sparsity levels per input, enhancing interpretability and feature relevance.

Findings

01

SoftSAE effectively learns meaningful features.

02

Adaptive sparsity improves data representation.

03

Code available at https://github.com/St0pien/SoftSAE.

Abstract

Sparse Autoencoders (SAEs) have become an important tool in mechanistic interpretability, helping to analyze internal representations in both Large Language Models (LLMs) and Vision Transformers (ViTs). By decomposing polysemantic activations into sparse sets of monosemantic features, SAEs aim to translate neural network computations into human-understandable concepts. However, common architectures such as TopK SAEs rely on a fixed sparsity level. They enforce the same number of active features (K) across all inputs, ignoring the varying complexity of real-world data. Natural data often lies on manifolds with varying local intrinsic dimensionality, meaning the number of relevant factors can change significantly across samples. This suggests that a fixed sparsity level is not optimal. Simple inputs may require only a few features, while more complex ones need more expressive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

St0pien/SoftSAE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.