$\textit{Revelio}$: Interpreting and leveraging semantic information in diffusion models
Dahye Kim, Xavier Thomas, Deepti Ghadiyaram

TL;DR
This paper investigates how semantic visual information is encoded in diffusion models, revealing interpretable features and demonstrating their utility for transfer learning across multiple datasets.
Contribution
It introduces a method using k-sparse autoencoders to interpret diffusion model features and analyzes how architecture and training data affect representation quality.
Findings
Diffusion features contain rich, interpretable semantic information.
Transfer learning with diffusion features is effective across datasets.
Different architectures and datasets influence feature granularity and biases.
Abstract
We study rich visual semantic information is represented within various layers and denoising timesteps of different diffusion architectures. We uncover monosemantic interpretable features by leveraging k-sparse autoencoders (k-SAE). We substantiate our mechanistic interpretations via transfer learning using light-weight classifiers on off-the-shelf diffusion models' features. On datasets, we demonstrate the effectiveness of diffusion features for representation learning. We provide an in-depth analysis of how different diffusion architectures, pre-training datasets, and language model conditioning impacts visual representation granularity, inductive biases, and transfer learning capabilities. Our work is a critical step towards deepening interpretability of black-box diffusion models. Code and visualizations available at: https://github.com/revelio-diffusion/revelio
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing
MethodsDiffusion
