Synthetic Training for Monocular Human Mesh Recovery
Yu Sun, Qian Bao, Wu Liu, Wenpeng Gao, Yili Fu, Chuang, Gan, Tao Mei

TL;DR
This paper introduces a fast, single-shot model for 3D human mesh recovery from monocular images, utilizing synthetic training data and a novel depth-to-scale projection to improve supervision and generalization.
Contribution
It presents a multi-branch framework for disentangling body part regressions, synthetic training with unpaired data, and a depth-to-scale projection for better supervision.
Findings
Outperforms previous methods on CMU Panoptic dataset
Achieves comparable results on Human3.6M and STB benchmarks
Significantly improves close shot image performance with D2S projection
Abstract
Recovering 3D human mesh from monocular images is a popular topic in computer vision and has a wide range of applications. This paper aims to estimate 3D mesh of multiple body parts (e.g., body, hands) with large-scale differences from a single RGB image. Existing methods are mostly based on iterative optimization, which is very time-consuming. We propose to train a single-shot model to achieve this goal. The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images. To solve this problem, we design a multi-branch framework to disentangle the regression of different body properties, enabling us to separate each component's training in a synthetic training manner using unpaired data available. Besides, to strengthen the generalization ability, most existing methods have used in-the-wild 2D pose datasets to supervise the estimated 3D pose via…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · 3D Shape Modeling and Analysis
