The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Omri Avrahami, Amir Hertz, Yael Vinker, Moab Arar, Shlomi Fruchter,, Ohad Fried, Daniel Cohen-Or, Dani Lischinski

TL;DR
This paper presents an automated method for generating consistent characters in text-to-image diffusion models using only text prompts, improving identity consistency without manual input.
Contribution
The authors introduce an iterative, fully automated approach for consistent character generation from text prompts, enhancing identity coherence in generated images.
Findings
Better balance between prompt alignment and identity consistency
Quantitative analysis shows improved results over baselines
User study confirms effectiveness of the method
Abstract
Recent advances in text-to-image generation models have unlocked vast potential for visual creativity. However, the users that use these models struggle with the generation of consistent characters, a crucial aspect for numerous real-world applications such as story visualization, game development, asset design, advertising, and more. Current methods typically rely on multiple pre-existing images of the target character or involve labor-intensive manual processes. In this work, we propose a fully automated solution for consistent character generation, with the sole input being a text prompt. We introduce an iterative procedure that, at each stage, identifies a coherent set of images sharing a similar identity and extracts a more consistent identity from this set. Our quantitative analysis demonstrates that our method strikes a better balance between prompt alignment and identity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Video Analysis and Summarization · Generative Adversarial Networks and Image Synthesis
MethodsSparse Evolutionary Training · Diffusion
