MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild
Gr\'egory Rogez, Cordelia Schmid

TL;DR
This paper introduces a novel image synthesis engine that augments real datasets with photorealistic images generated from 3D MoCap data, significantly improving 3D human pose estimation in wild images.
Contribution
The authors propose a new MoCap-guided image synthesis method to create large, diverse training datasets for CNNs, enhancing 3D pose estimation accuracy in real-world scenarios.
Findings
Outperforms state-of-the-art on Human3.6M dataset
Shows promising results on in-the-wild LSP dataset
Demonstrates CNN generalization from synthetic to real images
Abstract
This paper addresses the problem of 3D human pose estimation in the wild. A significant challenge is the lack of training data, i.e., 2D images of humans annotated with 3D poses. Such data is necessary to train state-of-the-art CNN architectures. Here, we propose a solution to generate a large set of photorealistic synthetic images of humans with 3D pose annotations. We introduce an image-based synthesis engine that artificially augments a dataset of real images with 2D human pose annotations using 3D Motion Capture (MoCap) data. Given a candidate 3D pose our algorithm selects for each joint an image whose 2D pose locally matches the projected 3D pose. The selected images are then combined to generate a new synthetic image by stitching local image patches in a kinematically constrained manner. The resulting images are used to train an end-to-end CNN for full-body 3D pose estimation. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Video Surveillance and Tracking Methods
