DDLP: Unsupervised Object-Centric Video Prediction with Deep Dynamic Latent Particles
Tal Daniel, Aviv Tamar

TL;DR
DDLP introduces an efficient, interpretable object-centric video prediction method using dynamic latent particles, enabling state-of-the-art results and flexible "what-if" scenario generation in videos.
Contribution
The paper presents a novel deep latent particle representation for object-centric video prediction, improving interpretability and efficiency over existing methods.
Findings
Achieved state-of-the-art object-centric video prediction results.
Enabled flexible "what-if" scenario generation.
Demonstrated efficient diffusion-based unconditional video generation.
Abstract
We propose a new object-centric video prediction algorithm based on the deep latent particle (DLP) representation. In comparison to existing slot- or patch-based representations, DLPs model the scene using a set of keypoints with learned parameters for properties such as position and size, and are both efficient and interpretable. Our method, deep dynamic latent particles (DDLP), yields state-of-the-art object-centric video prediction results on several challenging datasets. The interpretable nature of DDLP allows us to perform ``what-if'' generation -- predict the consequence of changing properties of objects in the initial frames, and DLP's compact structure enables efficient diffusion-based unconditional video generation. Videos, code and pre-trained models are available: https://taldatech.github.io/ddlp-web
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Cancer-related molecular mechanisms research
