Scene Graph Disentanglement and Composition for Generalizable Complex   Image Generation

Yunnan Wang; Ziqiang Li; Zequn Zhang; Wenyao Zhang; Baao Xie; Xihui; Liu; Wenjun Zeng; Xin Jin

arXiv:2410.00447·cs.CV·October 2, 2024

Scene Graph Disentanglement and Composition for Generalizable Complex Image Generation

Yunnan Wang, Ziqiang Li, Zequn Zhang, Wenyao Zhang, Baao Xie, Xihui, Liu, Wenjun Zeng, Xin Jin

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel framework combining scene graph representations with variational autoencoders and diffusion models to improve complex scene image generation, enabling better diversity, control, and manipulation.

Contribution

It proposes a new approach using SL-VAE, CMA, and MLS for disentangled, compositional, and manipulable scene graph-based image generation.

Findings

01

Outperforms recent methods in generation quality and controllability

02

Enables diverse and reasonable scene generation from scene graphs

03

Allows graph manipulation with consistent visual content

Abstract

There has been exciting progress in generating images from natural language or layout conditions. However, these methods struggle to faithfully reproduce complex scenes due to the insufficient modeling of multiple objects and their relationships. To address this issue, we leverage the scene graph, a powerful structured representation, for complex image generation. Different from the previous works that directly use scene graphs for generation, we employ the generative capabilities of variational autoencoders and diffusion models in a generalizable manner, compositing diverse disentangled visual clues from scene graphs. Specifically, we first propose a Semantics-Layout Variational AutoEncoder (SL-VAE) to jointly derive (layouts, semantics) from the input scene graph, which allows a more diverse and reasonable generation in a one-to-many mapping. We then develop a Compositional Masked…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Scene Graph Disentanglement and Composition for Generalizable Complex Image Generation· slideslive

Taxonomy

TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Generative Adversarial Networks and Image Synthesis

MethodsSoftmax · Attention Is All You Need · Diffusion