Synthetic Training for Accurate 3D Human Pose and Shape Estimation in the Wild
Akash Sengupta, Ignas Budvytis, Roberto Cipolla

TL;DR
This paper introduces STRAPS, a synthetic training system that improves 3D human shape and pose estimation from monocular RGB images by overcoming data scarcity through synthetic data generation and domain adaptation techniques.
Contribution
The paper presents a novel synthetic training framework, STRAPS, that enhances 3D human shape estimation accuracy using synthetic data and proxy representations, addressing data scarcity issues.
Findings
STRAPS outperforms existing methods on the SSP-3D dataset in shape prediction accuracy.
It remains competitive with state-of-the-art methods on pose estimation metrics.
Synthetic training with data augmentation effectively bridges the gap between synthetic and real inputs.
Abstract
This paper addresses the problem of monocular 3D human shape and pose estimation from an RGB image. Despite great progress in this field in terms of pose prediction accuracy, state-of-the-art methods often predict inaccurate body shapes. We suggest that this is primarily due to the scarcity of in-the-wild training data with diverse and accurate body shape labels. Thus, we propose STRAPS (Synthetic Training for Real Accurate Pose and Shape), a system that utilises proxy representations, such as silhouettes and 2D joints, as inputs to a shape and pose regression neural network, which is trained with synthetic training data (generated on-the-fly during training using the SMPL statistical body model) to overcome data scarcity. We bridge the gap between synthetic training inputs and noisy real inputs, which are predicted by keypoint detection and segmentation CNNs at test-time, by using data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Human Motion and Animation
