Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image   Synthesis

Jonghyun Lee; Hansam Cho; Youngjoon Yoo; Seoung Bum Kim; Yonghyun; Jeong

arXiv:2401.09048·cs.CV·January 18, 2024·1 cites

Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis

Jonghyun Lee, Hansam Cho, Youngjoon Yoo, Seoung Bum Kim, Yonghyun, Jeong

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a diffusion-based 3D-aware image synthesis method that localizes objects at different depths and combines multiple global styles using depth disentanglement and soft guidance techniques.

Contribution

It presents a novel framework, Compose and Conquer (CnC), integrating depth disentanglement and soft guidance for 3D-aware, multi-condition localized image synthesis.

Findings

01

Enables accurate 3D object placement in generated images

02

Allows compositional control of global semantics and object depth

03

Demonstrates versatility in synthesizing complex scenes

Abstract

Addressing the limitations of text as a source of accurate layout representation in text-conditional diffusion models, many works incorporate additional signals to condition certain attributes within a generated image. Although successful, previous works do not account for the specific localization of said attributes extended into the three dimensional plane. In this context, we present a conditional diffusion model that integrates control over three-dimensional object placement with disentangled representations of global stylistic semantics from multiple exemplar images. Specifically, we first introduce \textit{depth disentanglement training} to leverage the relative depth of objects as an estimator, allowing the model to identify the absolute positions of unseen objects through the use of synthetic image triplets. We also introduce \textit{soft guidance}, a method for imposing global…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tomtom1103/compose-and-conquer
pytorchOfficial

Videos

Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis· slideslive

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Handwritten Text Recognition Techniques · Human Motion and Animation

MethodsDiffusion