MoViD: View-Invariant 3D Human Pose Estimation via Motion-View Disentanglement
Yejia Liu, Hengle Jiang, Haoxian Liu, Runxi Huang, Xiaomin Ouyang

TL;DR
MoViD is a viewpoint-invariant 3D human pose estimation framework that disentangles view and motion features, improving robustness, efficiency, and real-time performance across diverse datasets.
Contribution
The paper introduces MoViD, a novel approach that disentangles viewpoint information from motion features for improved 3D pose estimation, especially under unseen viewpoints and limited data.
Findings
Reduces pose estimation error by over 24.2% compared to state-of-the-art.
Maintains robust performance with 60% less training data.
Achieves real-time inference at 15 FPS on NVIDIA edge devices.
Abstract
3D human pose estimation is a key enabling technology for applications such as healthcare monitoring, human-robot collaboration, and immersive gaming, but real-world deployment remains challenged by viewpoint variations. Existing methods struggle to generalize to unseen camera viewpoints, require large amounts of training data, and suffer from high inference latency. We propose MoViD, a viewpoint-invariant 3D human pose estimation framework that disentangles viewpoint information from motion features. The key idea is to extract viewpoint information from intermediate pose features and leverage it to enhance both the robustness and efficiency of pose estimation. MoViD introduces a view estimator that models key joint relationships to predict viewpoint information, and an orthogonal projection module to disentangle motion and view features, further enhanced through physics-grounded…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
