Towards an Interpretable Latent Space in Structured Models for Video Prediction
Rushil Gupta, Vishal Sharma, Yash Jain, Yitao Liang, Guy Van den, Broeck, Parag Singla

TL;DR
This paper introduces an object-centric video prediction model that incorporates physical laws into its latent space, enhancing interpretability and object localization without explicit supervision, and improves prediction accuracy across multiple domains.
Contribution
It integrates physical laws into a contrastive learning framework for object-centric video prediction, resulting in more interpretable and accurate object localization.
Findings
Improved object localization in 3 out of 4 domains.
Learned feature maps resemble actual object positions.
Enhanced interpretability of the latent space.
Abstract
We focus on the task of future frame prediction in video governed by underlying physical dynamics. We work with models which are object-centric, i.e., explicitly work with object representations, and propagate a loss in the latent space. Specifically, our research builds on recent work by Kipf et al. \cite{kipf&al20}, which predicts the next state via contrastive learning of object interactions in a latent space using a Graph Neural Network. We argue that injecting explicit inductive bias in the model, in form of general physical laws, can help not only make the model more interpretable, but also improve the overall prediction of model. As a natural by-product, our model can learn feature maps which closely resemble actual object positions in the image, without having any explicit supervision about the object positions at the training time. In comparison with earlier works…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)
MethodsGraph Neural Network · Contrastive Learning
