ProxyCap: Real-time Monocular Full-body Capture in World Space via Human-Centric Proxy-to-Motion Learning
Yuxiang Zhang, Hongwen Zhang, Liangxiao Hu, Jiajun Zhang, Hongwei Yi,, Shengping Zhang, Yebin Liu

TL;DR
ProxyCap introduces a real-time monocular full-body capture system that leverages human-centric proxy-to-motion learning, enabling accurate and physically plausible world-space motion estimation from monocular videos with moving cameras.
Contribution
It presents a novel human-centric proxy-to-motion learning scheme and a contact-aware neural motion descent module for real-time, accurate world-space full-body capture.
Findings
First real-time monocular full-body capture with foot-ground contact.
Achieves accurate world-space motion estimation from monocular videos.
Demonstrates robustness with hand-held moving cameras.
Abstract
Learning-based approaches to monocular motion capture have recently shown promising results by learning to regress in a data-driven manner. However, due to the challenges in data collection and network designs, it remains challenging for existing solutions to achieve real-time full-body capture while being accurate in world space. In this work, we introduce ProxyCap, a human-centric proxy-to-motion learning scheme to learn world-space motions from a proxy dataset of 2D skeleton sequences and 3D rotational motions. Such proxy data enables us to build a learning-based network with accurate world-space supervision while also mitigating the generalization issues. For more accurate and physically plausible predictions in world space, our network is designed to learn human motions from a human-centric perspective, which enables the understanding of the same motion captured with different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Video Surveillance and Tracking Methods
MethodsAttentive Walk-Aggregating Graph Neural Network
