Unsupervised Discovery of Parts, Structure, and Dynamics

Zhenjia Xu; Zhijian Liu; Chen Sun; Kevin Murphy; William T. Freeman,; Joshua B. Tenenbaum; Jiajun Wu

arXiv:1903.05136·cs.CV·March 14, 2019·25 cites

Unsupervised Discovery of Parts, Structure, and Dynamics

Zhenjia Xu, Zhijian Liu, Chen Sun, Kevin Murphy, William T. Freeman,, Joshua B. Tenenbaum, Jiajun Wu

PDF

Open Access

TL;DR

This paper introduces a novel unsupervised model that learns hierarchical object parts, their structure, and dynamics from unlabeled videos, mimicking human perception of object motion and structure.

Contribution

The PSD model simultaneously learns object parts, their hierarchical structure, and motion dynamics from unlabeled videos, integrating segmentation, structure, and prediction tasks.

Findings

01

Effective in segmenting object parts

02

Successfully builds hierarchical structures

03

Accurately models motion dynamics

Abstract

Humans easily recognize object parts and their hierarchical structure by watching how they move; they can then predict how each part moves in the future. In this paper, we propose a novel formulation that simultaneously learns a hierarchical, disentangled object representation and a dynamics model for object parts from unlabeled videos. Our Parts, Structure, and Dynamics (PSD) model learns to, first, recognize the object parts via a layered image representation; second, predict hierarchy via a structural descriptor that composes low-level concepts into a hierarchical structure; and third, model the system dynamics by predicting the future. Experiments on multiple real and synthetic datasets demonstrate that our PSD model works well on all three tasks: segmenting object parts, building their hierarchical structure, and capturing their motion distributions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications