Compositional Image Decomposition with Diffusion Models

Jocelin Su; Nan Liu; Yanbo Wang; Joshua B. Tenenbaum; Yilun Du

arXiv:2406.19298·cs.CV·June 28, 2024

Compositional Image Decomposition with Diffusion Models

Jocelin Su, Nan Liu, Yanbo Wang, Joshua B. Tenenbaum, Yilun Du

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Decomp Diffusion, an unsupervised diffusion-based method for decomposing images into components like objects and lighting, enabling flexible scene recomposition and novel scene generation.

Contribution

The paper presents a novel unsupervised approach to decompose images into diffusion model components, allowing flexible scene editing and recomposition beyond training data.

Findings

01

Successfully decomposes images into meaningful components

02

Enables recomposition of scenes from different components

03

Demonstrates scene generation with novel combinations

Abstract

Given an image of a natural scene, we are able to quickly decompose it into a set of components such as objects, lighting, shadows, and foreground. We can then envision a scene where we combine certain components with those from other images, for instance a set of objects from our bedroom and animals from a zoo under the lighting conditions of a forest, even if we have never encountered such a scene before. In this paper, we present a method to decompose an image into such compositional components. Our approach, Decomp Diffusion, is an unsupervised method which, when given a single image, infers a set of different components in the image, each represented by a diffusion model. We demonstrate how components can capture different factors of the scene, ranging from global scene descriptors like shadows or facial expression to local scene descriptors like constituent objects. We further…

Peer Reviews

Decision·ICML 2024 Poster

Reviewer 01Rating 8· accept, good paperConfidence 4

Strengths

Unsupervised image intrinsic decomposition/re-composition is very challenging and one of the most fundamental open issues in computer vision. Using diffusion models for this purpose seems a natural choice (given the success of DM in natural image generation, and in learning semantic image properties). The authors give a rigorous justification of their choices from a mathematical point of view. The paper's idea is well argued. The illustrated results show the strong potential of the approach. I

Weaknesses

Qualitative results are promising but still leave room for improvement. Reconstructed images appear blurry, and at low resolution. But at this stage this is not a major issue and that might be improved by further work.

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

+ The paper addresses compositional modeling for images using denoising diffusion models. The recomposition quality seems promising. + The paper shows that energy functions are additive of primitives.

Weaknesses

+ The method seems to be similar to [1] + What is the computational cost? It may takes more space and computational resources with K diffusion models [1] Du et al, Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC, ICML 2023

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

The idea of leveraging the connection between Energy-based models and diffusion models for image decomposition is interesting and effective. The compositional concepts from images can be discovered in an unsupervised manner. The experimental results show that the proposed method can discover both global and local concepts, and be used for component compositions across multiple datasets and models.

Weaknesses

1. The quantitative evaluation is not thorough. The current quantitative evaluation only focuses on the global factors, while the quantitative evaluation for the local factors and cross dataset generalization is missing. In contrast, the existing work (COMET) contains quantitative comparisons for the object-level decomposition. 2. As the proposed method contains a set of diffusion models, the computational cost of the proposed method and existing works should be discussed in the paper. 3. For tr

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGeochemistry and Geologic Mapping · Hydrocarbon exploration and reservoir analysis

MethodsSparse Evolutionary Training · Diffusion