MoCaNet: Motion Retargeting in-the-wild via Canonicalization Networks
Wentao Zhu, Zhuoqian Yang, Ziang Di, Wayne Wu, Yizhou Wang, Chen, Change Loy

TL;DR
This paper introduces MoCaNet, a framework for 3D human motion retargeting from 2D videos in wild environments, using canonicalization networks and unsupervised learning without 3D annotations.
Contribution
It proposes a novel canonicalization-based approach that disentangles motion, structure, and view, enabling accurate 3D motion retargeting from monocular videos without supervision.
Findings
Achieves superior motion transfer performance on benchmarks.
Effectively disentangles motion, structure, and view for interpretability.
Operates without 3D annotations or motion-body pairing.
Abstract
We present a novel framework that brings the 3D motion retargeting task from controlled environments to in-the-wild scenarios. In particular, our method is capable of retargeting body motion from a character in a 2D monocular video to a 3D character without using any motion capture system or 3D reconstruction procedure. It is designed to leverage massive online videos for unsupervised training, needless of 3D annotations or motion-body pairing information. The proposed method is built upon two novel canonicalization operations, structure canonicalization and view canonicalization. Trained with the canonicalization operations and the derived regularizations, our method learns to factorize a skeleton sequence into three independent semantic subspaces, i.e., motion, structure, and view angle. The disentangled representation enables motion retargeting from 2D to 3D with high precision. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · Human Motion and Animation
