OCK: Unsupervised Dynamic Video Prediction with Object-Centric Kinematics

Yeon-Ji Song; Jaein Kim; Suhyung Choi; Jin-Hwa Kim; Byoung-Tak Zhang

arXiv:2404.18423·cs.CV·July 22, 2025

OCK: Unsupervised Dynamic Video Prediction with Object-Centric Kinematics

Yeon-Ji Song, Jaein Kim, Suhyung Choi, Jin-Hwa Kim, Byoung-Tak Zhang

PDF

Open Access

TL;DR

The paper introduces OCK, an unsupervised model that enhances dynamic video prediction by explicitly modeling object kinematics alongside appearance, improving understanding of complex multi-object scenes.

Contribution

The paper presents a novel object kinematics component integrated into object-centric transformers, enabling better modeling of object motion dynamics in video prediction.

Findings

01

Superior performance on complex scene prediction tasks

02

Effective modeling of object motion and interactions

03

Long-term spatiotemporal prediction accuracy

Abstract

Human perception involves decomposing complex multi-object scenes into time-static object appearance (i.e., size, shape, color) and time-varying object motion (i.e., position, velocity, acceleration). For machines to achieve human-like intelligence in real-world interactions, understanding these physical properties of objects is essential, forming the foundation for dynamic video prediction. While recent advancements in object-centric transformers have demonstrated potential in video prediction, they primarily focus on object appearance, often overlooking motion dynamics, which is crucial for modeling dynamic interactions and maintaining temporal consistency in complex environments. To address these limitations, we propose OCK, a dynamic video prediction model leveraging object-centric kinematics and object slots. We introduce a novel component named Object Kinematics that comprises…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTime Series Analysis and Forecasting · Image Processing and 3D Reconstruction · Human Motion and Animation

MethodsFocus