SpotActor: Training-Free Layout-Controlled Consistent Image Generation
Jiahao Wang, Caixia Yan, Weizhan Zhang, Haonan Lin, Mengmeng Wang,, Guang Dai, Tieliang Gong, Hao Sun, Jingdong Wang

TL;DR
SpotActor introduces a training-free, layout-controlled image generation method that ensures consistent placement and appearance of subjects across images, advancing artistic creation and comic production.
Contribution
It pioneers the Layout-to-Consistent-Image generation task with a novel dual energy guidance formalization and a training-free pipeline, including innovative attention mechanisms.
Findings
Achieves superior layout alignment and subject consistency.
Demonstrates effectiveness through comprehensive experiments.
Outperforms existing methods in prompt conformity and background diversity.
Abstract
Text-to-image diffusion models significantly enhance the efficiency of artistic creation with high-fidelity image generation. However, in typical application scenarios like comic book production, they can neither place each subject into its expected spot nor maintain the consistent appearance of each subject across images. For these issues, we pioneer a novel task, Layout-to-Consistent-Image (L2CI) generation, which produces consistent and compositional images in accordance with the given layout conditions and text prompts. To accomplish this challenging task, we present a new formalization of dual energy guidance with optimization in a dual semantic-latent space and thus propose a training-free pipeline, SpotActor, which features a layout-conditioned backward update stage and a consistent forward sampling stage. In the backward stage, we innovate a nuanced layout energy function to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Augmented Reality Applications · Advanced Image and Video Retrieval Techniques
MethodsSoftmax · Attention Is All You Need · Diffusion
