Compositional Image Decomposition with Diffusion Models
Jocelin Su, Nan Liu, Yanbo Wang, Joshua B. Tenenbaum, Yilun Du

TL;DR
This paper introduces Decomp Diffusion, an unsupervised diffusion-based method for decomposing images into components like objects and lighting, enabling flexible scene recomposition and novel scene generation.
Contribution
The paper presents a novel unsupervised approach to decompose images into diffusion model components, allowing flexible scene editing and recomposition beyond training data.
Findings
Successfully decomposes images into meaningful components
Enables recomposition of scenes from different components
Demonstrates scene generation with novel combinations
Abstract
Given an image of a natural scene, we are able to quickly decompose it into a set of components such as objects, lighting, shadows, and foreground. We can then envision a scene where we combine certain components with those from other images, for instance a set of objects from our bedroom and animals from a zoo under the lighting conditions of a forest, even if we have never encountered such a scene before. In this paper, we present a method to decompose an image into such compositional components. Our approach, Decomp Diffusion, is an unsupervised method which, when given a single image, infers a set of different components in the image, each represented by a diffusion model. We demonstrate how components can capture different factors of the scene, ranging from global scene descriptors like shadows or facial expression to local scene descriptors like constituent objects. We further…
Peer Reviews
Decision·ICML 2024 Poster
Unsupervised image intrinsic decomposition/re-composition is very challenging and one of the most fundamental open issues in computer vision. Using diffusion models for this purpose seems a natural choice (given the success of DM in natural image generation, and in learning semantic image properties). The authors give a rigorous justification of their choices from a mathematical point of view. The paper's idea is well argued. The illustrated results show the strong potential of the approach. I
Qualitative results are promising but still leave room for improvement. Reconstructed images appear blurry, and at low resolution. But at this stage this is not a major issue and that might be improved by further work.
+ The paper addresses compositional modeling for images using denoising diffusion models. The recomposition quality seems promising. + The paper shows that energy functions are additive of primitives.
+ The method seems to be similar to [1] + What is the computational cost? It may takes more space and computational resources with K diffusion models [1] Du et al, Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC, ICML 2023
The idea of leveraging the connection between Energy-based models and diffusion models for image decomposition is interesting and effective. The compositional concepts from images can be discovered in an unsupervised manner. The experimental results show that the proposed method can discover both global and local concepts, and be used for component compositions across multiple datasets and models.
1. The quantitative evaluation is not thorough. The current quantitative evaluation only focuses on the global factors, while the quantitative evaluation for the local factors and cross dataset generalization is missing. In contrast, the existing work (COMET) contains quantitative comparisons for the object-level decomposition. 2. As the proposed method contains a set of diffusion models, the computational cost of the proposed method and existing works should be discussed in the paper. 3. For tr
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeochemistry and Geologic Mapping · Hydrocarbon exploration and reservoir analysis
MethodsSparse Evolutionary Training · Diffusion
