Unveiling Concept Attribution in Diffusion Models
Quang H. Nguyen, Hoang Phan, Khoa D. Doan

TL;DR
This paper introduces CAD, a framework for understanding how components in diffusion models contribute to concept generation, revealing both positive and negative influences, and enabling targeted model editing.
Contribution
The paper presents a novel component attribution method for diffusion models that uncovers both positive and negative concept-related components, enhancing interpretability and enabling precise model editing.
Findings
Identifies both positive and negative components influencing concept generation.
Demonstrates effective model editing through CAD-Erase and CAD-Amplify algorithms.
Validates the importance of both component types in generating and controlling concepts.
Abstract
Diffusion models have shown remarkable abilities in generating realistic and high-quality images from text prompts. However, a trained model remains largely black-box; little do we know about the roles of its components in exhibiting a concept such as objects or styles. Recent works employ causal tracing to localize knowledge-storing layers in generative models without showing how other layers contribute to the target concept. In this work, we approach diffusion models' interpretability problem from a more general perspective and pose a question: \textit{``How do model components work jointly to demonstrate knowledge?''}. To answer this question, we decompose diffusion models using component attribution, systematically unveiling the importance of each component (specifically the model parameter) in generating a concept. The proposed framework, called \textbf{C}omponent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAnalytical Chemistry and Chromatography · Advanced Text Analysis Techniques
MethodsDiffusion
