Improving face generation quality and prompt following with synthetic   captions

Michail Tarasiou; Stylianos Moschoglou; Jiankang Deng; Stefanos; Zafeiriou

arXiv:2405.10864·cs.CV·May 20, 2024

Improving face generation quality and prompt following with synthetic captions

Michail Tarasiou, Stylianos Moschoglou, Jiankang Deng, Stefanos, Zafeiriou

PDF

Open Access

TL;DR

This paper introduces a training-free pipeline to generate synthetic captions for face datasets, which, when used to fine-tune diffusion models, significantly improves the realism and prompt adherence in face generation.

Contribution

The authors propose a novel, training-free method to create synthetic captions for face images, enhancing diffusion models' ability to generate realistic human faces aligned with prompts.

Findings

01

Improved face generation quality and realism

02

Enhanced prompt adherence in generated images

03

Synthetic captions effectively fine-tune diffusion models

Abstract

Recent advancements in text-to-image generation using diffusion models have significantly improved the quality of generated images and expanded the ability to depict a wide range of objects. However, ensuring that these models adhere closely to the text prompts remains a considerable challenge. This issue is particularly pronounced when trying to generate photorealistic images of humans. Without significant prompt engineering efforts models often produce unrealistic images and typically fail to incorporate the full extent of the prompt information. This limitation can be largely attributed to the nature of captions accompanying the images used in training large scale diffusion models, which typically prioritize contextual information over details related to the person's appearance. In this paper we address this issue by introducing a training-free pipeline designed to generate accurate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSubtitles and Audiovisual Media · Video Analysis and Summarization

MethodsDiffusion