Learning Variational Motion Prior for Video-based Motion Capture
Xin Chen, Zhuo Su, Lingbo Yang, Pei Cheng, Lan Xu, Bin Fu, and Gang Yu

TL;DR
This paper introduces a variational motion prior framework using a transformer-based autoencoder to improve video-based motion capture, especially in challenging scenarios involving occlusion and complex poses, enabling real-time and stable motion estimation.
Contribution
We propose a novel variational motion prior model with a transformer-based autoencoder and style-mapping, enhancing generalization and real-time performance in video-based motion capture.
Findings
Reduces temporal jittering and failure modes in pose estimation
Achieves real-time motion capture during inference
Demonstrates superior performance on public and in-the-wild datasets
Abstract
Motion capture from a monocular video is fundamental and crucial for us humans to naturally experience and interact with each other in Virtual Reality (VR) and Augmented Reality (AR). However, existing methods still struggle with challenging cases involving self-occlusion and complex poses due to the lack of effective motion prior modeling. In this paper, we present a novel variational motion prior (VMP) learning approach for video-based motion capture to resolve the above issue. Instead of directly building the correspondence between the video and motion domain, We propose to learn a generic latent space for capturing the prior distribution of all natural motions, which serve as the basis for subsequent video-based motion capture tasks. To improve the generalization capacity of prior space, we propose a transformer-based variational autoencoder pretrained over marker-based 3D mocap…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Human Motion and Animation
MethodsTemporal Jittering
