MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos

Kehong Gong; Zhengyu Wen; Weixia He; Mingxi Xu; Qi Wang; Ning Zhang; Zhengyu Li; Dongze Lian; Wei Zhao; Xiaoyu He; and Mingyuan Zhang

arXiv:2512.10881·cs.CV·May 1, 2026

MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos

Kehong Gong, Zhengyu Wen, Weixia He, Mingxi Xu, Qi Wang, Ning Zhang, Zhengyu Li, Dongze Lian, Wei Zhao, Xiaoyu He, and Mingyuan Zhang

PDF

1 Repo

TL;DR

MoCapAnything is a unified framework that enables 3D motion capture from monocular videos for arbitrary skeletons, using a reference-guided, factorized approach with a new dataset and cross-species retargeting capabilities.

Contribution

It introduces a novel, category-agnostic motion capture system that reconstructs animations for any rigged asset from monocular videos, advancing flexibility and scalability.

Findings

01

Achieves high-quality skeletal animations across diverse rigs.

02

Demonstrates effective cross-species retargeting in in-the-wild videos.

03

Outperforms existing methods on in-domain benchmarks.

Abstract

Motion capture now underpins content creation far beyond digital humans, yet most existing pipelines remain species- or template-specific. We formalize this gap as Category-Agnostic Motion Capture (CAMoCap): given a monocular video and an arbitrary rigged 3D asset as a prompt, the goal is to reconstruct a rotation-based animation such as BVH that directly drives the specific asset. We present MoCapAnything, a reference-guided, factorized framework that first predicts 3D joint trajectories and then recovers asset-specific rotations via constraint-aware inverse kinematics. The system contains three learnable modules and a lightweight IK stage: (1) a Reference Prompt Encoder that extracts per-joint queries from the asset's skeleton, mesh, and rendered images; (2) a Video Feature Extractor that computes dense visual descriptors and reconstructs a coarse 4D deforming mesh to bridge the gap…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

animotionlab/MoCapAnything
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.