A Critical View of Vision-Based Long-Term Dynamics Prediction Under Environment Misalignment
Hanchen Xie, Jiageng Zhu, Mahyar Khayatkhoei, Jiazhi Li, Mohamed E., Hussein, Wael AbdAlmageed

TL;DR
This paper critically examines the limitations of vision-based long-term dynamics prediction models under environment misalignment, introduces challenging datasets, and proposes a mitigation strategy to improve robustness across domains and contexts.
Contribution
It introduces four new datasets for evaluating environment misalignment in dynamics prediction and proposes a mitigation approach that significantly reduces the impact of cross-domain challenges.
Findings
RPCIN performs well on standard conditions but struggles under environment misalignment.
The proposed mitigation strategy dramatically alleviates cross-domain prediction issues.
The datasets reveal specific weaknesses of current vision-based models in challenging environments.
Abstract
Dynamics prediction, which is the problem of predicting future states of scene objects based on current and prior states, is drawing increasing attention as an instance of learning physics. To solve this problem, Region Proposal Convolutional Interaction Network (RPCIN), a vision-based model, was proposed and achieved state-of-the-art performance in long-term prediction. RPCIN only takes raw images and simple object descriptions, such as the bounding box and segmentation mask of each object, as input. However, despite its success, the model's capability can be compromised under conditions of environment misalignment. In this paper, we investigate two challenging conditions for environment misalignment: Cross-Domain and Cross-Context by proposing four datasets that are designed for these challenges: SimB-Border, SimB-Split, BlenB-Border, and BlenB-Split. The datasets cover two domains…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Neural Networks and Applications · Human Pose and Action Recognition
