Adaptive Sparse Allocation with Mutual Choice & Feature Choice Sparse Autoencoders
Kola Ayonrinde

TL;DR
This paper introduces adaptive sparse autoencoders with mutual and feature choice mechanisms, enabling variable feature activation per token, leading to better feature utilization and improved reconstruction performance.
Contribution
The paper proposes two novel SAE variants, Feature Choice and Mutual Choice, allowing flexible sparsity allocation and introducing a new auxiliary loss to enhance feature utilization.
Findings
Fewer dead features with the new methods.
Improved reconstruction loss at same sparsity levels.
Enhanced interpretability and scalability of feature extraction.
Abstract
Sparse autoencoders (SAEs) are a promising approach to extracting features from neural networks, enabling model interpretability as well as causal interventions on model internals. SAEs generate sparse feature representations using a sparsifying activation function that implicitly defines a set of token-feature matches. We frame the token-feature matching as a resource allocation problem constrained by a total sparsity upper bound. For example, TopK SAEs solve this allocation problem with the additional constraint that each token matches with at most features. In TopK SAEs, the active features per token constraint is the same across tokens, despite some tokens being more difficult to reconstruct than others. To address this limitation, we propose two novel SAE variants, Feature Choice SAEs and Mutual Choice SAEs, which each allow for a variable number of active features per…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Face recognition and analysis · Advanced Data Compression Techniques
MethodsSparse Evolutionary Training
