ViPS: Video-informed Pose Spaces for Auto-Rigged Meshes
Honglin Chen, Karran Pandey, Rundi Wu, Matheus Gadelha, Yannick Hold-Geoffroy, Ayush Tewari, Niloy J. Mitra, Changxi Zheng, Paul Guerrero

TL;DR
ViPS introduces a novel, video-informed framework that learns a plausible, controllable pose space for auto-rigged meshes, enabling diverse, realistic animations without relying on extensive 4D datasets.
Contribution
It transfers generative video priors into a universal, shape-specific pose space for auto-rigged meshes, improving plausibility, diversity, and generalization in 3D pose modeling.
Findings
ViPS matches state-of-the-art models trained on 4D data in plausibility and diversity.
It enables sampling, inverse kinematics, and temporally coherent animations.
ViPS generalizes well to unseen species and skeletal topologies.
Abstract
Kinematic rigs provide a structured interface for articulating 3D meshes but lack any associated pose space, i.e., an explicit representation of the plausible manifold of joint configurations for a given mesh. Without such a pose space, stochastic sampling or manual manipulation of raw rig parameters easily results in semantic and/or geometric violations, such as anatomical hyperextension and non-physical self-intersections. We propose Video-informed Pose Spaces (ViPS), a feedforward framework that discovers the latent distribution of valid articulations for auto-rigged meshes by distilling motion priors from a pretrained video diffusion model. Unlike existing methods that rely on scarce, artist-authored 4D datasets, or focus on reconstructing instances of individual motions, ViPS transfers generative video model priors into a universal distribution over the given rig parameterization.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
