From Text to Pose to Image: Improving Diffusion Model Control and Quality
Cl\'ement Bonnet, Ariel N. Lee, Franck Wertel, Antoine Tamano, Tanguy, Cizain, Pablo Ducru

TL;DR
This paper introduces a novel text-to-pose-to-image framework that enhances control over pose and image quality in diffusion models, addressing previous limitations in pose diversity and fidelity.
Contribution
The authors develop a text-to-pose generative model, a new sampling algorithm, and an improved pose adapter, enabling better pose control and image quality in diffusion-based image synthesis.
Findings
Achieved state-of-the-art pose control in diffusion models
Enhanced pose fidelity with more keypoints in the adapter
Enabled diverse pose generation from semantic text descriptions
Abstract
In the last two years, text-to-image diffusion models have become extremely popular. As their quality and usage increase, a major concern has been the need for better output control. In addition to prompt engineering, one effective method to improve the controllability of diffusion models has been to condition them on additional modalities such as image style, depth map, or keypoints. This forms the basis of ControlNets or Adapters. When attempting to apply these methods to control human poses in outputs of text-to-image diffusion models, two main challenges have arisen. The first challenge is generating poses following a wide range of semantic text descriptions, for which previous methods involved searching for a pose within a dataset of (caption, pose) pairs. The second challenge is conditioning image generation on a specified pose while keeping both high aesthetic and high pose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
MethodsAdapter · Diffusion
