Supervised sparse auto-encoders for interpretable and compositional representations

Ouns El Harzli; Hugo Wallner; Yoonsoo Nam; Haixuan Xavier Tao

arXiv:2602.00924·cs.AI·May 19, 2026

Supervised sparse auto-encoders for interpretable and compositional representations

Ouns El Harzli, Hugo Wallner, Yoonsoo Nam, Haixuan Xavier Tao

PDF

TL;DR

This paper introduces supervised sparse auto-encoders that produce interpretable, compositional representations, enabling semantic image editing and generalization to unseen concept combinations.

Contribution

It adapts neural collapse theory to supervised auto-encoders, improving interpretability and compositionality of learned features for image reconstruction.

Findings

01

Demonstrates compositional generalization on Stable Diffusion 3.5

02

Enables feature-level semantic image editing

03

Addresses non-smoothness and alignment issues in sparse auto-encoders

Abstract

Sparse auto-encoders (SAEs) have re-emerged as a prominent method for mechanistic interpretability, yet they face two significant challenges: the non-smoothness of the $L_{1}$ penalty, which hinders reconstruction and scalability, and a lack of alignment between learned features and human semantics. In this paper, we address these limitations by adapting unconstrained feature models, a mathematical framework from neural collapse theory, and by supervising the task. We supervise (decoder-only) SAEs to reconstruct feature vectors by jointly learning sparse concept embeddings and decoder weights. Validated on Stable Diffusion 3.5, our approach demonstrates compositional generalization, successfully reconstructing images with concept combinations unseen during training, and enabling feature-level intervention for semantic image editing without prompt modification.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.