PlaySlot: Learning Inverse Latent Dynamics for Controllable Object-Centric Video Prediction and Planning
Angel Villar-Corrales, Sven Behnke

TL;DR
PlaySlot introduces an object-centric video prediction model that infers latent actions from unlabeled videos, enabling versatile future forecasting and efficient robot behavior learning from unlabeled demonstrations.
Contribution
It is the first to infer object representations and latent actions from unlabeled videos for controllable object-centric video prediction and planning.
Findings
Outperforms existing stochastic and object-centric baselines in video prediction.
Enables learning robot behaviors efficiently from unlabeled video data.
Generates multiple plausible future scenarios conditioned on inferred or generated latent actions.
Abstract
Predicting future scene representations is a crucial task for enabling robots to understand and interact with the environment. However, most existing methods rely on videos and simulations with precise action annotations, limiting their ability to leverage the large amount of available unlabeled video data. To address this challenge, we propose PlaySlot, an object-centric video prediction model that infers object representations and latent actions from unlabeled video sequences. It then uses these representations to forecast future object states and video frames. PlaySlot allows the generation of multiple possible futures conditioned on latent actions, which can be inferred from video dynamics, provided by a user, or generated by a learned action policy, thus enabling versatile and interpretable world modeling. Our results show that PlaySlot outperforms both stochastic and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
