A Critical View of Vision-Based Long-Term Dynamics Prediction Under   Environment Misalignment

Hanchen Xie; Jiageng Zhu; Mahyar Khayatkhoei; Jiazhi Li; Mohamed E.; Hussein; Wael AbdAlmageed

arXiv:2305.07648·cs.CV·June 16, 2023·1 cites

A Critical View of Vision-Based Long-Term Dynamics Prediction Under Environment Misalignment

Hanchen Xie, Jiageng Zhu, Mahyar Khayatkhoei, Jiazhi Li, Mohamed E., Hussein, Wael AbdAlmageed

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper critically examines the limitations of vision-based long-term dynamics prediction models under environment misalignment, introduces challenging datasets, and proposes a mitigation strategy to improve robustness across domains and contexts.

Contribution

It introduces four new datasets for evaluating environment misalignment in dynamics prediction and proposes a mitigation approach that significantly reduces the impact of cross-domain challenges.

Findings

01

RPCIN performs well on standard conditions but struggles under environment misalignment.

02

The proposed mitigation strategy dramatically alleviates cross-domain prediction issues.

03

The datasets reveal specific weaknesses of current vision-based models in challenging environments.

Abstract

Dynamics prediction, which is the problem of predicting future states of scene objects based on current and prior states, is drawing increasing attention as an instance of learning physics. To solve this problem, Region Proposal Convolutional Interaction Network (RPCIN), a vision-based model, was proposed and achieved state-of-the-art performance in long-term prediction. RPCIN only takes raw images and simple object descriptions, such as the bounding box and segmentation mask of each object, as input. However, despite its success, the model's capability can be compromised under conditions of environment misalignment. In this paper, we investigate two challenging conditions for environment misalignment: Cross-Domain and Cross-Context by proposing four datasets that are designed for these challenges: SimB-Border, SimB-Split, BlenB-Border, and BlenB-Split. The datasets cover two domains…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vimal-isi-edu/vdp-emc
pytorchOfficial

Videos

A Critical View of Vision-Based Long-Term Dynamics Prediction Under Environment Misalignment· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Neural Networks and Applications · Human Pose and Action Recognition