From Text to Pose to Image: Improving Diffusion Model Control and   Quality

Cl\'ement Bonnet; Ariel N. Lee; Franck Wertel; Antoine Tamano; Tanguy; Cizain; Pablo Ducru

arXiv:2411.12872·cs.CV·November 25, 2024

From Text to Pose to Image: Improving Diffusion Model Control and Quality

Cl\'ement Bonnet, Ariel N. Lee, Franck Wertel, Antoine Tamano, Tanguy, Cizain, Pablo Ducru

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel text-to-pose-to-image framework that enhances control over pose and image quality in diffusion models, addressing previous limitations in pose diversity and fidelity.

Contribution

The authors develop a text-to-pose generative model, a new sampling algorithm, and an improved pose adapter, enabling better pose control and image quality in diffusion-based image synthesis.

Findings

01

Achieved state-of-the-art pose control in diffusion models

02

Enhanced pose fidelity with more keypoints in the adapter

03

Enabled diverse pose generation from semantic text descriptions

Abstract

In the last two years, text-to-image diffusion models have become extremely popular. As their quality and usage increase, a major concern has been the need for better output control. In addition to prompt engineering, one effective method to improve the controllability of diffusion models has been to condition them on additional modalities such as image style, depth map, or keypoints. This forms the basis of ControlNets or Adapters. When attempting to apply these methods to control human poses in outputs of text-to-image diffusion models, two main challenges have arisen. The first challenge is generating poses following a wide range of semantic text descriptions, for which previous methods involved searching for a pose within a dataset of (caption, pose) pairs. The second challenge is conditioning image generation on a specified pose while keeping both high aesthetic and high pose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

clement-bonnet/text-to-pose
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies

MethodsAdapter · Diffusion