Unsupervised Learning of Object Structure and Dynamics from Videos
Matthias Minderer, Chen Sun, Ruben Villegas, Forrester Cole, Kevin, Murphy, Honglak Lee

TL;DR
This paper introduces an unsupervised approach that learns object structure and motion dynamics from videos using keypoints, enabling stable prediction and improved object-level understanding across diverse datasets.
Contribution
It proposes a novel keypoint-based stochastic dynamics model that improves video prediction and object motion understanding without supervision.
Findings
Outperforms unstructured representations in pixel-level video prediction.
Enhances object tracking, action recognition, and reward prediction.
Effective across multiple datasets including sports, human motion, and control tasks.
Abstract
Extracting and predicting object structure and dynamics from videos without supervision is a major challenge in machine learning. To address this challenge, we adopt a keypoint-based image representation and learn a stochastic dynamics model of the keypoints. Future frames are reconstructed from the keypoints and a reference frame. By modeling dynamics in the keypoint coordinate space, we achieve stable learning and avoid compounding of errors in pixel space. Our method improves upon unstructured representations both for pixel-level video prediction and for downstream tasks requiring object-level understanding of motion dynamics. We evaluate our model on diverse datasets: a multi-agent sports dataset, the Human3.6M dataset, and datasets based on continuous control tasks from the DeepMind Control Suite. The spatially structured representation outperforms unstructured representations on a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications
