Scene Graph Disentanglement and Composition for Generalizable Complex Image Generation
Yunnan Wang, Ziqiang Li, Zequn Zhang, Wenyao Zhang, Baao Xie, Xihui, Liu, Wenjun Zeng, Xin Jin

TL;DR
This paper introduces a novel framework combining scene graph representations with variational autoencoders and diffusion models to improve complex scene image generation, enabling better diversity, control, and manipulation.
Contribution
It proposes a new approach using SL-VAE, CMA, and MLS for disentangled, compositional, and manipulable scene graph-based image generation.
Findings
Outperforms recent methods in generation quality and controllability
Enables diverse and reasonable scene generation from scene graphs
Allows graph manipulation with consistent visual content
Abstract
There has been exciting progress in generating images from natural language or layout conditions. However, these methods struggle to faithfully reproduce complex scenes due to the insufficient modeling of multiple objects and their relationships. To address this issue, we leverage the scene graph, a powerful structured representation, for complex image generation. Different from the previous works that directly use scene graphs for generation, we employ the generative capabilities of variational autoencoders and diffusion models in a generalizable manner, compositing diverse disentangled visual clues from scene graphs. Specifically, we first propose a Semantics-Layout Variational AutoEncoder (SL-VAE) to jointly derive (layouts, semantics) from the input scene graph, which allows a more diverse and reasonable generation in a one-to-many mapping. We then develop a Compositional Masked…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Generative Adversarial Networks and Image Synthesis
MethodsSoftmax · Attention Is All You Need · Diffusion
