Neural Marionette: Unsupervised Learning of Motion Skeleton and Latent Dynamics from Volumetric Video
Jinseok Bae, Hojun Jang, Cheol-Hui Min, Hyungun Choi, Young Min Kim

TL;DR
Neural Marionette is an unsupervised method that discovers skeletal structures and learns latent motion dynamics from volumetric videos, enabling diverse motion generation without prior knowledge.
Contribution
It introduces a novel unsupervised approach to infer skeletal structures and motion priors directly from volumetric video data, outperforming prior methods that require manual annotations.
Findings
Discovered skeletal structures comparable to ground truth.
Learned motion priors enable diverse and interpolated motion generation.
Generalizes to motion retargeting across different skeletons.
Abstract
We present Neural Marionette, an unsupervised approach that discovers the skeletal structure from a dynamic sequence and learns to generate diverse motions that are consistent with the observed motion dynamics. Given a video stream of point cloud observation of an articulated body under arbitrary motion, our approach discovers the unknown low-dimensional skeletal relationship that can effectively represent the movement. Then the discovered structure is utilized to encode the motion priors of dynamic sequences in a latent structure, which can be decoded to the relative joint rotations to represent the full skeletal motion. Our approach works without any prior knowledge of the underlying motion or skeletal structure, and we demonstrate that the discovered structure is even comparable to the hand-labeled ground truth skeleton in representing a 4D sequence of motion. The skeletal structure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Advanced Vision and Imaging
