Coupled Inference in Diffusion Models for Semantic Decomposition
Calvin Yeung, Ali Zakeri, Zhuowen Zou, Mohsen Imani

TL;DR
This paper introduces a novel coupled inference framework using diffusion models for semantic decomposition, improving over resonator networks in recognizing and reconstructing compositional visual scene factors.
Contribution
It presents a new method that couples diffusion processes for semantic decomposition, including a reconstruction-guided guidance term and an iterative sampling scheme, unifying resonator networks as a special case.
Findings
Outperforms resonator networks on synthetic tasks
Introduces a reconstruction-driven guidance term
Develops an iterative sampling scheme for better performance
Abstract
Many visual scenes can be described as compositions of latent factors. Effective recognition, reasoning, and editing often require not only forming such compositional representations, but also solving the decomposition problem. One popular choice for constructing these representations is through the binding operation. Resonator networks, which can be understood as coupled Hopfield networks, were proposed as a way to perform decomposition on such bound representations. Recent works have shown notable similarities between Hopfield networks and diffusion models. Motivated by these observations, we introduce a framework for semantic decomposition using coupled inference in diffusion models. Our method frames semantic decomposition as an inverse problem and couples the diffusion processes using a reconstruction-driven guidance term that encourages the composition of factor estimates to match…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face Recognition and Perception · Domain Adaptation and Few-Shot Learning
