Controlling Human Shape and Pose in Text-to-Image Diffusion Models via   Domain Adaptation

Benito Buchheim; Max Reimann; J\"urgen D\"ollner

arXiv:2411.04724·cs.CV·November 8, 2024

Controlling Human Shape and Pose in Text-to-Image Diffusion Models via Domain Adaptation

Benito Buchheim, Max Reimann, J\"urgen D\"ollner

PDF

Open Access

TL;DR

This paper introduces a domain-adaptation method for text-to-image diffusion models that enables detailed human shape and pose control using synthetic data, improving diversity and fidelity in generated images.

Contribution

It proposes a novel domain-adaptation technique that preserves image quality while controlling human shape and pose via a fine-tuned ControlNet architecture on synthetic data.

Findings

01

Achieves greater shape and pose diversity than 2D pose-based ControlNet.

02

Maintains high visual fidelity and stability in generated images.

03

Proves useful for downstream human animation tasks.

Abstract

We present a methodology for conditional control of human shape and pose in pretrained text-to-image diffusion models using a 3D human parametric model (SMPL). Fine-tuning these diffusion models to adhere to new conditions requires large datasets and high-quality annotations, which can be more cost-effectively acquired through synthetic data generation rather than real-world data. However, the domain gap and low scene diversity of synthetic data can compromise the pretrained model's visual fidelity. We propose a domain-adaptation technique that maintains image quality by isolating synthetically trained conditional information in the classifier-free guidance vector and composing it with another control network to adapt the generated images to the input domain. To achieve SMPL control, we fine-tune a ControlNet-based architecture on the synthetic SURREAL dataset of rendered humans and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Human Motion and Animation · Face recognition and analysis

MethodsDiffusion