The Chosen One: Consistent Characters in Text-to-Image Diffusion Models

Omri Avrahami; Amir Hertz; Yael Vinker; Moab Arar; Shlomi Fruchter,; Ohad Fried; Daniel Cohen-Or; Dani Lischinski

arXiv:2311.10093·cs.CV·July 16, 2024·1 cites

The Chosen One: Consistent Characters in Text-to-Image Diffusion Models

Omri Avrahami, Amir Hertz, Yael Vinker, Moab Arar, Shlomi Fruchter,, Ohad Fried, Daniel Cohen-Or, Dani Lischinski

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper presents an automated method for generating consistent characters in text-to-image diffusion models using only text prompts, improving identity consistency without manual input.

Contribution

The authors introduce an iterative, fully automated approach for consistent character generation from text prompts, enhancing identity coherence in generated images.

Findings

01

Better balance between prompt alignment and identity consistency

02

Quantitative analysis shows improved results over baselines

03

User study confirms effectiveness of the method

Abstract

Recent advances in text-to-image generation models have unlocked vast potential for visual creativity. However, the users that use these models struggle with the generation of consistent characters, a crucial aspect for numerous real-world applications such as story visualization, game development, asset design, advertising, and more. Current methods typically rely on multiple pre-existing images of the target character or involve labor-intensive manual processes. In this work, we propose a fully automated solution for consistent character generation, with the sole input being a text prompt. We introduce an iterative procedure that, at each stage, identifies a coherent set of images sharing a similar identity and extracts a more consistent identity from this set. Our quantitative analysis demonstrates that our method strikes a better balance between prompt alignment and identity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZichengDuan/TheChosenOne
pytorch

Datasets

Minusone/subject_motion
dataset· 27 dl
27 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Video Analysis and Summarization · Generative Adversarial Networks and Image Synthesis

MethodsSparse Evolutionary Training · Diffusion