Towards an Interpretable Latent Space in Structured Models for Video   Prediction

Rushil Gupta; Vishal Sharma; Yash Jain; Yitao Liang; Guy Van den; Broeck; Parag Singla

arXiv:2107.07713·cs.LG·July 19, 2021

Towards an Interpretable Latent Space in Structured Models for Video Prediction

Rushil Gupta, Vishal Sharma, Yash Jain, Yitao Liang, Guy Van den, Broeck, Parag Singla

PDF

Open Access

TL;DR

This paper introduces an object-centric video prediction model that incorporates physical laws into its latent space, enhancing interpretability and object localization without explicit supervision, and improves prediction accuracy across multiple domains.

Contribution

It integrates physical laws into a contrastive learning framework for object-centric video prediction, resulting in more interpretable and accurate object localization.

Findings

01

Improved object localization in 3 out of 4 domains.

02

Learned feature maps resemble actual object positions.

03

Enhanced interpretability of the latent space.

Abstract

We focus on the task of future frame prediction in video governed by underlying physical dynamics. We work with models which are object-centric, i.e., explicitly work with object representations, and propagate a loss in the latent space. Specifically, our research builds on recent work by Kipf et al. \cite{kipf&al20}, which predicts the next state via contrastive learning of object interactions in a latent space using a Graph Neural Network. We argue that injecting explicit inductive bias in the model, in form of general physical laws, can help not only make the model more interpretable, but also improve the overall prediction of model. As a natural by-product, our model can learn feature maps which closely resemble actual object positions in the image, without having any explicit supervision about the object positions at the training time. In comparison with earlier works…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)

MethodsGraph Neural Network · Contrastive Learning