3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation

Dewei Zhou; Ji Xie; Zongxin Yang; Yi Yang

arXiv:2410.12669·cs.CV·December 3, 2025

3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation

Dewei Zhou, Ji Xie, Zongxin Yang, Yi Yang

PDF

Open Access 1 Repo 1 Models

TL;DR

3DIS introduces a two-stage, depth-driven framework for text-to-image generation that improves instance layout accuracy and attribute rendering without additional training, enhancing multi-instance generation capabilities.

Contribution

It proposes a novel decoupled approach that separates scene layout and attribute rendering, enabling robust, training-free multi-instance image synthesis with compatibility across models.

Findings

01

Outperforms existing methods in layout precision and attribute rendering.

02

Demonstrates robustness and adaptability across diverse foundational models.

03

Achieves significant improvements on COCO benchmarks.

Abstract

The increasing demand for controllable outputs in text-to-image generation has spurred advancements in multi-instance generation (MIG), allowing users to define both instance layouts and attributes. However, unlike image-conditional generation methods such as ControlNet, MIG techniques have not been widely adopted in state-of-the-art models like SD2 and SDXL, primarily due to the challenge of building robust renderers that simultaneously handle instance positioning and attribute rendering. In this paper, we introduce Depth-Driven Decoupled Instance Synthesis (3DIS), a novel framework that decouples the MIG process into two stages: (i) generating a coarse scene depth map for accurate instance positioning and scene composition, and (ii) rendering fine-grained attributes using pre-trained ControlNet on any foundational model, without additional training. Our 3DIS framework integrates a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

limuloo/3DIS
pytorchOfficial

Models

🤗
sanaka87/3DIS
model· 27 dl· ♡ 7
27 dl♡ 7

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Computer Graphics and Visualization Techniques · Image Processing and 3D Reconstruction

MethodsAdapter