Deep Dreams Are Made of This: Visualizing Monosemantic Features in Diffusion Models
Adam Szokalski, Mateusz Modrzejewski

TL;DR
This paper introduces latent visualization by optimization (LVO), a technique extending feature visualization to latent diffusion models, enabling clear visualization of monosemantic features through disentangled autoencoder representations.
Contribution
The paper presents LVO, a novel interpretability method for diffusion models that disentangles features and visualizes monosemantic concepts in the latent space.
Findings
SAE features produce clear visualizations of recognizable concepts
Regularization techniques transfer from pixel-space to latent domain
LVO provides insights into feature activation mechanisms
Abstract
This paper proposes latent visualization by optimization (LVO), a mechanistic interpretability technique that extends feature visualization by optimization - originally developed for convolutional neural networks - to latent diffusion models. LVO employs sparse autoencoders (SAEs) to disentangle polysemantic layer representations into monosemantic features. Key contributions include latent-space optimization, time-step activity analysis, schedule-matched noise injection, prior initialization through feature steering, and suitable regularization strategies. We demonstrate the method on Stable Diffusion 1.5 fine-tuned on the Style50 dataset, showing that SAE features produce clear visualizations of recognizable concepts - including diagonal compositions, human figures, roses, cables, and waterfall foam - that correlate with dataset examples, while the baseline without disentanglement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
