CASL: Concept-Aligned Sparse Latents for Interpreting Diffusion Models

Zhenghao He; Guangzhi Xiong; Boyang Wang; Sanchit Sinha; Aidong Zhang

arXiv:2601.15441·cs.LG·January 23, 2026

CASL: Concept-Aligned Sparse Latents for Interpreting Diffusion Models

Zhenghao He, Guangzhi Xiong, Boyang Wang, Sanchit Sinha, Aidong Zhang

PDF

Open Access

TL;DR

CASL introduces a supervised framework that aligns sparse latent dimensions of diffusion models with semantic concepts, enabling more interpretable and controllable image generation.

Contribution

This work is the first to achieve supervised alignment of latent representations with semantic concepts in diffusion models, enhancing interpretability and control.

Findings

01

CASL achieves superior editing precision and interpretability.

02

CASL-Steer effectively reveals how concept latents influence generated content.

03

The method provides a causal probe for semantic understanding in diffusion models.

Abstract

Internal activations of diffusion models encode rich semantic information, but interpreting such representations remains challenging. While Sparse Autoencoders (SAEs) have shown promise in disentangling latent representations, existing SAE-based methods for diffusion model understanding rely on unsupervised approaches that fail to align sparse features with human-understandable concepts. This limits their ability to provide reliable semantic control over generated images. We introduce CASL (Concept-Aligned Sparse Latents), a supervised framework that aligns sparse latent dimensions of diffusion models with semantic concepts. CASL first trains an SAE on frozen U-Net activations to obtain disentangled latent representations, and then learns a lightweight linear mapping that associates each concept with a small set of relevant latent dimensions. To validate the semantic meaning of these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Functional Brain Connectivity Studies