Beyond the Black Box: Identifiable Interpretation and Control in Generative Models via Causal Minimality
Lingjing Kong, Shaoan Xie, Guangyi Chen, Yuewen Sun, Xiangchen Song, Eric P. Xing, Kun Zhang

TL;DR
This paper establishes a causal minimality framework for interpretable generative models, enabling clear causal understanding and control of latent representations, with empirical validation on text-to-image diffusion models.
Contribution
It introduces a theoretical foundation for hierarchical, interpretable generative models based on causal minimality, and demonstrates practical extraction and control of concepts in diffusion models.
Findings
Latent representations can be aligned with true data-generating variables.
Hierarchical concept graphs can be extracted from diffusion models.
Causal constraints enable fine-grained model steering.
Abstract
Deep generative models, while revolutionizing fields like image and text generation, largely operate as opaque ``black boxes'', hindering human understanding, control, and alignment. While methods like sparse autoencoders (SAEs) show remarkable empirical success, they often lack theoretical guarantees, risking subjective insights. Our primary objective is to establish a principled foundation for interpretable generative models. We demonstrate that the principle of causal minimality -- favoring the simplest causal explanation -- can endow the latent representations of modern generative models with clear causal interpretation and robust, component-wise identifiable control. We introduce a novel theoretical framework for hierarchical selection models, where higher-level concepts emerge from the constrained composition of lower-level variables, better capturing the complex dependencies in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
