World-Grounded Human Motion Recovery via Gravity-View Coordinates
Zehong Shen, Huaijin Pi, Yan Xia, Zhi Cen, Sida Peng, Zechen Hu, Hujun, Bao, Ruizhen Hu, Xiaowei Zhou

TL;DR
This paper introduces a gravity-view coordinate system for monocular human motion recovery, reducing ambiguity and error accumulation, leading to more accurate and realistic world-grounded motion estimation from videos.
Contribution
The novel Gravity-View coordinate system enables direct, gravity-aligned pose estimation per frame, improving accuracy and robustness over autoregressive methods.
Findings
Outperforms state-of-the-art in accuracy and speed
Reduces error accumulation in motion sequences
Produces more realistic world-grounded human motion
Abstract
We present a novel method for recovering world-grounded human motion from monocular video. The main challenge lies in the ambiguity of defining the world coordinate system, which varies between sequences. Previous approaches attempt to alleviate this issue by predicting relative motion in an autoregressive manner, but are prone to accumulating errors. Instead, we propose estimating human poses in a novel Gravity-View (GV) coordinate system, which is defined by the world gravity and the camera view direction. The proposed GV system is naturally gravity-aligned and uniquely defined for each video frame, largely reducing the ambiguity of learning image-pose mapping. The estimated poses can be transformed back to the world coordinate system using camera rotations, forming a global motion sequence. Additionally, the per-frame estimation avoids error accumulation in the autoregressive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsGravity
