Controlling Human Shape and Pose in Text-to-Image Diffusion Models via Domain Adaptation
Benito Buchheim, Max Reimann, J\"urgen D\"ollner

TL;DR
This paper introduces a domain-adaptation method for text-to-image diffusion models that enables detailed human shape and pose control using synthetic data, improving diversity and fidelity in generated images.
Contribution
It proposes a novel domain-adaptation technique that preserves image quality while controlling human shape and pose via a fine-tuned ControlNet architecture on synthetic data.
Findings
Achieves greater shape and pose diversity than 2D pose-based ControlNet.
Maintains high visual fidelity and stability in generated images.
Proves useful for downstream human animation tasks.
Abstract
We present a methodology for conditional control of human shape and pose in pretrained text-to-image diffusion models using a 3D human parametric model (SMPL). Fine-tuning these diffusion models to adhere to new conditions requires large datasets and high-quality annotations, which can be more cost-effectively acquired through synthetic data generation rather than real-world data. However, the domain gap and low scene diversity of synthetic data can compromise the pretrained model's visual fidelity. We propose a domain-adaptation technique that maintains image quality by isolating synthetically trained conditional information in the classifier-free guidance vector and composing it with another control network to adapt the generated images to the input domain. To achieve SMPL control, we fine-tune a ControlNet-based architecture on the synthetic SURREAL dataset of rendered humans and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Human Motion and Animation · Face recognition and analysis
MethodsDiffusion
