Cross-View World Models

Rishabh Sharma; Gijs Hogervorst; Wayne E. Mackey; David J. Heeger; Stefano Martiniani

arXiv:2602.07277·cs.CV·February 10, 2026

Cross-View World Models

Rishabh Sharma, Gijs Hogervorst, Wayne E. Mackey, David J. Heeger, Stefano Martiniani

PDF

Open Access

TL;DR

This paper introduces Cross-View World Models (XVWM) that learn view-invariant environment representations by predicting future states across different viewpoints, enhancing planning and spatial understanding in agents.

Contribution

The paper presents a novel cross-view prediction objective for training world models, enabling multi-view consistency and improved spatial reasoning in agents.

Findings

01

Multi-view consistency improves spatially grounded representations.

02

The model enables planning from multiple viewpoints, including egocentric and bird's-eye views.

03

Predicting from different perspectives may facilitate perspective-taking in multi-agent systems.

Abstract

World models enable agents to plan by imagining future states, but existing approaches operate from a single viewpoint, typically egocentric, even when other perspectives would make planning easier; navigation, for instance, benefits from a bird's-eye view. We introduce Cross-View World Models (XVWM), trained with a cross-view prediction objective: given a sequence of frames from one viewpoint, predict the future state from the same or a different viewpoint after an action is taken. Enforcing cross-view consistency acts as geometric regularization: because the input and output views may share little or no visual overlap, to predict across viewpoints, the model must learn view-invariant representations of the environment's 3D structure. We train on synchronized multi-view gameplay data from Aimlabs, an aim-training platform providing precisely aligned multi-camera recordings with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Robotics and Sensor-Based Localization