TL;DR
This paper introduces a new unsupervised method for representing and animating articulated objects by identifying meaningful parts and modeling their motions, leading to improved animation quality and user preference.
Contribution
It presents a novel unsupervised approach that extracts semantically relevant regions and disentangles shape and pose for articulated object animation, outperforming previous methods.
Findings
Achieves 96.6% user preference over state-of-the-art methods.
Effectively disentangles shape and pose in motion representations.
Surpasses previous methods on existing benchmarks, especially for articulated objects.
Abstract
We propose novel motion representations for animating articulated objects consisting of distinct parts. In a completely unsupervised manner, our method identifies object parts, tracks them in a driving video, and infers their motions by considering their principal axes. In contrast to the previous keypoint-based works, our method extracts meaningful and consistent regions, describing locations, shape, and pose. The regions correspond to semantically relevant and distinct object parts, that are more easily detected in frames of the driving video. To force decoupling of foreground from background, we model non-object related global motion with an additional affine transformation. To facilitate animation and prevent the leakage of the shape of the driving object, we disentangle shape and pose of objects in the region space. Our model can animate a variety of objects, surpassing previous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
