RigMo: Unifying Rig and Motion Learning for Generative Animation
Hao Zhang, Jiahao Luo, Bohui Wan, Yizhou Zhao, Zongrui Li, Michael Vasilkovsky, Chaoyang Wang, Jian Wang, Narendra Ahuja, Bing Zhou

TL;DR
RigMo is a novel unified framework that learns rig and motion directly from raw mesh sequences, enabling scalable, interpretable, and physically plausible 3D animation generation without human annotations.
Contribution
It introduces a joint learning approach for rig and motion from raw data, with a new latent space encoding explicit structure and dynamics, improving over existing auto-rigging methods.
Findings
Achieves superior reconstruction accuracy
Learns smooth and interpretable rigs
Demonstrates strong generalization across categories
Abstract
Despite significant progress in 4D generation, rig and motion, the core structural and dynamic components of animation are typically modeled as separate problems. Existing pipelines rely on ground-truth skeletons and skinning weights for motion generation and treat auto-rigging as an independent process, undermining scalability and interpretability. We present RigMo, a unified generative framework that jointly learns rig and motion directly from raw mesh sequences, without any human-provided rig annotations. RigMo encodes per-vertex deformations into two compact latent spaces: a rig latent that decodes into explicit Gaussian bones and skinning weights, and a motion latent that produces time-varying SE(3) transformations. Together, these outputs define an animatable mesh with explicit structure and coherent motion, enabling feed-forward rig and motion inference for deformable objects.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Human Motion and Animation · Generative Adversarial Networks and Image Synthesis
