SpotActor: Training-Free Layout-Controlled Consistent Image Generation

Jiahao Wang; Caixia Yan; Weizhan Zhang; Haonan Lin; Mengmeng Wang,; Guang Dai; Tieliang Gong; Hao Sun; Jingdong Wang

arXiv:2409.04801·cs.CV·September 10, 2024

SpotActor: Training-Free Layout-Controlled Consistent Image Generation

Jiahao Wang, Caixia Yan, Weizhan Zhang, Haonan Lin, Mengmeng Wang,, Guang Dai, Tieliang Gong, Hao Sun, Jingdong Wang

PDF

Open Access

TL;DR

SpotActor introduces a training-free, layout-controlled image generation method that ensures consistent placement and appearance of subjects across images, advancing artistic creation and comic production.

Contribution

It pioneers the Layout-to-Consistent-Image generation task with a novel dual energy guidance formalization and a training-free pipeline, including innovative attention mechanisms.

Findings

01

Achieves superior layout alignment and subject consistency.

02

Demonstrates effectiveness through comprehensive experiments.

03

Outperforms existing methods in prompt conformity and background diversity.

Abstract

Text-to-image diffusion models significantly enhance the efficiency of artistic creation with high-fidelity image generation. However, in typical application scenarios like comic book production, they can neither place each subject into its expected spot nor maintain the consistent appearance of each subject across images. For these issues, we pioneer a novel task, Layout-to-Consistent-Image (L2CI) generation, which produces consistent and compositional images in accordance with the given layout conditions and text prompts. To accomplish this challenging task, we present a new formalization of dual energy guidance with optimization in a dual semantic-latent space and thus propose a training-free pipeline, SpotActor, which features a layout-conditioned backward update stage and a consistent forward sampling stage. In the backward stage, we innovate a nuanced layout energy function to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques · Augmented Reality Applications · Advanced Image and Video Retrieval Techniques

MethodsSoftmax · Attention Is All You Need · Diffusion