MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons

Kehong Gong; Zhengyu Wen; Dao Thien Phong; Mingxi Xu; Weixia He; Qi Wang; Ning Zhang; Zhengyu Li; Guanli Hou; Dongze Lian; Xiaoyu He; Mingyuan Zhang; Hanwang Zhang

arXiv:2604.28130·cs.CV·May 15, 2026

MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons

Kehong Gong, Zhengyu Wen, Dao Thien Phong, Mingxi Xu, Weixia He, Qi Wang, Ning Zhang, Zhengyu Li, Guanli Hou, Dongze Lian, Xiaoyu He, Mingyuan Zhang, Hanwang Zhang

PDF

2 Repos

TL;DR

This paper introduces MoCapAnything V2, an end-to-end motion capture framework that jointly learns pose and rotation estimation from monocular video, improving accuracy and efficiency over prior factorized methods.

Contribution

It presents the first fully end-to-end trainable system for arbitrary skeleton motion capture, incorporating a reference pose for better rotation prediction and direct joint position estimation.

Findings

01

Reduces rotation error from ~17° to ~10° and 6.54° on unseen skeletons.

02

Achieves ~20x faster inference than mesh-based pipelines.

03

Improves robustness and efficiency by predicting joint positions directly from video.

Abstract

Recent methods for arbitrary-skeleton motion capture from monocular video follow a factorized pipeline, where a Video-to-Pose network predicts joint positions and an analytical inverse-kinematics (IK) stage recovers joint rotations. While effective, this design is inherently limited, since joint positions do not fully determine rotations and leave degrees of freedom such as bone-axis twist ambiguous, and the non-differentiable IK stage prevents the system from adapting to noisy predictions or optimizing for the final animation objective. In this work, we present the first fully end-to-end framework in which both Video-to-Pose and Pose-to-Rotation are learnable and jointly optimized. We observe that the ambiguity in pose-to-rotation mapping arises from missing coordinate system information: the same joint positions can correspond to different rotations under different rest poses and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.