DiffX: Guide Your Layout to Cross-Modal Generative Modeling

Zeyu Wang; Jingyu Lin; Yifei Qian; Yi Huang; Shicen Tian; Bosong Chai,; Juncan Deng; Qu Yang; Lan Du; Cunjian Chen; Kejie Huang

arXiv:2407.15488·cs.CV·October 22, 2024

DiffX: Guide Your Layout to Cross-Modal Generative Modeling

Zeyu Wang, Jingyu Lin, Yifei Qian, Yi Huang, Shicen Tian, Bosong Chai,, Juncan Deng, Qu Yang, Lan Du, Cunjian Chen, Kejie Huang

PDF

Open Access 1 Repo

TL;DR

DiffX is a novel diffusion model that enables layout-guided cross-modal image generation across diverse modalities by operating in a shared latent space and incorporating a new embedder for enhanced condition interaction.

Contribution

The paper introduces DiffX, the first layout-guided cross-modal diffusion model, with a compact pipeline and a new joint-modality embedder for improved condition interaction.

Findings

01

Demonstrates robustness in RGB+X image generation on multiple datasets.

02

Shows potential for generating diverse modalities beyond RGB+X.

03

Achieves strong results guided by various layout conditions.

Abstract

Diffusion models have made significant strides in language-driven and layout-driven image generation. However, most diffusion models are limited to visible RGB image generation. In fact, human perception of the world is enriched by diverse viewpoints, such as chromatic contrast, thermal illumination, and depth information. In this paper, we introduce a novel diffusion model for general layout-guided cross-modal generation, called DiffX. Notably, our DiffX presents a compact and effective cross-modal generative modeling pipeline, which conducts diffusion and denoising processes in the modality-shared latent space. Moreover, we introduce the Joint-Modality Embedder (JME) to enhance the interaction between layout and text conditions by incorporating a gated attention mechanism. To facilitate the user-instructed training, we construct the cross-modal image datasets with detailed text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zeyuwang-zju/diffx
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel-Driven Software Engineering Techniques · Semantic Web and Ontologies · Simulation Techniques and Applications

MethodsSoftmax · Attention Is All You Need · Diffusion