CASL: Concept-Aligned Sparse Latents for Interpreting Diffusion Models
Zhenghao He, Guangzhi Xiong, Boyang Wang, Sanchit Sinha, Aidong Zhang

TL;DR
CASL introduces a supervised framework that aligns sparse latent dimensions of diffusion models with semantic concepts, enabling more interpretable and controllable image generation.
Contribution
This work is the first to achieve supervised alignment of latent representations with semantic concepts in diffusion models, enhancing interpretability and control.
Findings
CASL achieves superior editing precision and interpretability.
CASL-Steer effectively reveals how concept latents influence generated content.
The method provides a causal probe for semantic understanding in diffusion models.
Abstract
Internal activations of diffusion models encode rich semantic information, but interpreting such representations remains challenging. While Sparse Autoencoders (SAEs) have shown promise in disentangling latent representations, existing SAE-based methods for diffusion model understanding rely on unsupervised approaches that fail to align sparse features with human-understandable concepts. This limits their ability to provide reliable semantic control over generated images. We introduce CASL (Concept-Aligned Sparse Latents), a supervised framework that aligns sparse latent dimensions of diffusion models with semantic concepts. CASL first trains an SAE on frozen U-Net activations to obtain disentangled latent representations, and then learns a lightweight linear mapping that associates each concept with a small set of relevant latent dimensions. To validate the semantic meaning of these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Functional Brain Connectivity Studies
