Adaptive Sparse Allocation with Mutual Choice & Feature Choice Sparse   Autoencoders

Kola Ayonrinde

arXiv:2411.02124·cs.LG·November 11, 2024

Adaptive Sparse Allocation with Mutual Choice & Feature Choice Sparse Autoencoders

Kola Ayonrinde

PDF

Open Access

TL;DR

This paper introduces adaptive sparse autoencoders with mutual and feature choice mechanisms, enabling variable feature activation per token, leading to better feature utilization and improved reconstruction performance.

Contribution

The paper proposes two novel SAE variants, Feature Choice and Mutual Choice, allowing flexible sparsity allocation and introducing a new auxiliary loss to enhance feature utilization.

Findings

01

Fewer dead features with the new methods.

02

Improved reconstruction loss at same sparsity levels.

03

Enhanced interpretability and scalability of feature extraction.

Abstract

Sparse autoencoders (SAEs) are a promising approach to extracting features from neural networks, enabling model interpretability as well as causal interventions on model internals. SAEs generate sparse feature representations using a sparsifying activation function that implicitly defines a set of token-feature matches. We frame the token-feature matching as a resource allocation problem constrained by a total sparsity upper bound. For example, TopK SAEs solve this allocation problem with the additional constraint that each token matches with at most $k$ features. In TopK SAEs, the $k$ active features per token constraint is the same across tokens, despite some tokens being more difficult to reconstruct than others. To address this limitation, we propose two novel SAE variants, Feature Choice SAEs and Mutual Choice SAEs, which each allow for a variable number of active features per…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Face recognition and analysis · Advanced Data Compression Techniques

MethodsSparse Evolutionary Training