Unsupervised Discovery of Parts, Structure, and Dynamics
Zhenjia Xu, Zhijian Liu, Chen Sun, Kevin Murphy, William T. Freeman,, Joshua B. Tenenbaum, Jiajun Wu

TL;DR
This paper introduces a novel unsupervised model that learns hierarchical object parts, their structure, and dynamics from unlabeled videos, mimicking human perception of object motion and structure.
Contribution
The PSD model simultaneously learns object parts, their hierarchical structure, and motion dynamics from unlabeled videos, integrating segmentation, structure, and prediction tasks.
Findings
Effective in segmenting object parts
Successfully builds hierarchical structures
Accurately models motion dynamics
Abstract
Humans easily recognize object parts and their hierarchical structure by watching how they move; they can then predict how each part moves in the future. In this paper, we propose a novel formulation that simultaneously learns a hierarchical, disentangled object representation and a dynamics model for object parts from unlabeled videos. Our Parts, Structure, and Dynamics (PSD) model learns to, first, recognize the object parts via a layered image representation; second, predict hierarchy via a structural descriptor that composes low-level concepts into a hierarchical structure; and third, model the system dynamics by predicting the future. Experiments on multiple real and synthetic datasets demonstrate that our PSD model works well on all three tasks: segmenting object parts, building their hierarchical structure, and capturing their motion distributions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
