Setting the Stage: Text-Driven Scene-Consistent Image Generation
Cong Xie, Che Wang, Yan Zhang, Ruiqi Yu, Han Zou, Zheng Pan, Zhenpeng Zhan

TL;DR
This paper introduces a new method for scene staging that synthesizes images based on text and reference scenes, overcoming data scarcity with a novel data pipeline and achieving superior alignment and diversity.
Contribution
It proposes a data construction pipeline and a correspondence-guided attention loss to improve scene and text alignment in image generation.
Findings
Outperforms state-of-the-art methods in scene and text alignment.
Generates diverse viewpoints and compositions.
Maintains reference scene identity while following textual instructions.
Abstract
We focus on the foundational task of Scene Staging: given a reference scene image and a text condition specifying an actor category to be generated in the scene and its spatial relation to the scene, the goal is to synthesize an output image that preserves the same scene identity as the reference image while correctly generating the actor according to the spatial relation described in the text. Existing methods struggle with this task, largely due to the scarcity of high-quality paired data and unconstrained generation objectives. To overcome the data bottleneck, we propose a novel data construction pipeline that combines real-world photographs, entity removal, and image-to-video diffusion models to generate training pairs with diverse scenes, viewpoints and correct entity-scene relationships. Furthermore, we introduce a novel correspondence-guided attention loss that leverages…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · 3D Shape Modeling and Analysis
