Articulate your NeRF: Unsupervised articulated object modeling via conditional view synthesis
Jianning Deng, Kartic Subr, Hakan Bilen

TL;DR
This paper introduces an unsupervised approach for modeling articulated objects, learning pose and segmentation from minimal observations by leveraging implicit models, voxel initialization, and decoupled optimization, outperforming prior methods.
Contribution
It presents a novel unsupervised framework for articulated object modeling that generalizes to multiple parts and requires few views, with improved performance over previous work.
Findings
Significantly better segmentation and pose estimation accuracy.
Effective generalization to multi-part objects.
Robust performance with limited observational views.
Abstract
We propose a novel unsupervised method to learn the pose and part-segmentation of articulated objects with rigid parts. Given two observations of an object in different articulation states, our method learns the geometry and appearance of object parts by using an implicit model from the first observation, distils the part segmentation and articulation from the second observation while rendering the latter observation. Additionally, to tackle the complexities in the joint optimization of part segmentation and articulation, we propose a voxel grid-based initialization strategy and a decoupled optimization procedure. Compared to the prior unsupervised work, our model obtains significantly better performance, and generalizes to objects with multiple parts while it can be efficiently from few views for the latter observation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Human Motion and Animation
