Toward Human Understanding with Controllable Synthesis
Hanz Cuevas-Velasquez, Priyanka Patel, Haiwen Feng, Michael Black

TL;DR
This paper introduces a controllable synthesis method that balances realism and ground truth accuracy in generated images, improving training data quality for 3D human pose and shape estimation.
Contribution
It presents a novel approach combining generative models with traditional graphics to produce realistic yet accurate training data for HPS estimation.
Findings
Generated images with controlled realism improve HPS accuracy.
Balancing realism and ground truth is crucial for effective training data.
The proposed method outperforms purely synthetic or purely realistic approaches.
Abstract
Training methods to perform robust 3D human pose and shape (HPS) estimation requires diverse training images with accurate ground truth. While BEDLAM demonstrates the potential of traditional procedural graphics to generate such data, the training images are clearly synthetic. In contrast, generative image models produce highly realistic images but without ground truth. Putting these methods together seems straightforward: use a generative model with the body ground truth as controlling signal. However, we find that, the more realistic the generated images, the more they deviate from the ground truth, making them inappropriate for training and evaluation. Enhancements of realistic details, such as clothing and facial expressions, can lead to subtle yet significant deviations from the ground truth, potentially misleading training models. We empirically verify that this misalignment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Automated Systems
