Image-based Synthesis for Deep 3D Human Pose Estimation
Gr\'egory Rogez, Cordelia Schmid

TL;DR
This paper presents a novel image-based synthesis engine that generates photorealistic synthetic images with 3D human pose annotations, significantly enhancing training data for CNN-based 3D pose estimation in diverse environments.
Contribution
The authors introduce a new image synthesis method that combines real images and 3D motion data to create large, annotated datasets without domain adaptation, improving 3D human pose estimation.
Findings
Outperforms existing methods on Human3.6M dataset
Shows promising results on real-world images (LSP)
Synthetic images generalize well to real images
Abstract
This paper addresses the problem of 3D human pose estimation in the wild. A significant challenge is the lack of training data, i.e., 2D images of humans annotated with 3D poses. Such data is necessary to train state-of-the-art CNN architectures. Here, we propose a solution to generate a large set of photorealistic synthetic images of humans with 3D pose annotations. We introduce an image-based synthesis engine that artificially augments a dataset of real images with 2D human pose annotations using 3D motion capture data. Given a candidate 3D pose, our algorithm selects for each joint an image whose 2D pose locally matches the projected 3D pose. The selected images are then combined to generate a new synthetic image by stitching local image patches in a kinematically constrained manner. The resulting images are used to train an end-to-end CNN for full-body 3D pose estimation. We cluster…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
