Cross-View World Models
Rishabh Sharma, Gijs Hogervorst, Wayne E. Mackey, David J. Heeger, Stefano Martiniani

TL;DR
This paper introduces Cross-View World Models (XVWM) that learn view-invariant environment representations by predicting future states across different viewpoints, enhancing planning and spatial understanding in agents.
Contribution
The paper presents a novel cross-view prediction objective for training world models, enabling multi-view consistency and improved spatial reasoning in agents.
Findings
Multi-view consistency improves spatially grounded representations.
The model enables planning from multiple viewpoints, including egocentric and bird's-eye views.
Predicting from different perspectives may facilitate perspective-taking in multi-agent systems.
Abstract
World models enable agents to plan by imagining future states, but existing approaches operate from a single viewpoint, typically egocentric, even when other perspectives would make planning easier; navigation, for instance, benefits from a bird's-eye view. We introduce Cross-View World Models (XVWM), trained with a cross-view prediction objective: given a sequence of frames from one viewpoint, predict the future state from the same or a different viewpoint after an action is taken. Enforcing cross-view consistency acts as geometric regularization: because the input and output views may share little or no visual overlap, to predict across viewpoints, the model must learn view-invariant representations of the environment's 3D structure. We train on synchronized multi-view gameplay data from Aimlabs, an aim-training platform providing precisely aligned multi-camera recordings with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Robotics and Sensor-Based Localization
