TL;DR
Pathdreamer is a novel visual world model that generates plausible high-resolution 360-degree views of unseen indoor spaces, aiding navigation tasks by predicting diverse possible scenes based on limited observations.
Contribution
It introduces a new model capable of generating realistic and diverse visual predictions of unvisited indoor viewpoints, enhancing embodied navigation capabilities.
Findings
Pathdreamer improves navigation planning by predicting unseen views.
Using Pathdreamer achieves about half the benefit of actual visual observations.
The model encodes useful visual, spatial, and semantic knowledge of indoor environments.
Abstract
People navigating in unfamiliar buildings take advantage of myriad visual, spatial and semantic cues to efficiently achieve their navigation goals. Towards equipping computational agents with similar capabilities, we introduce Pathdreamer, a visual world model for agents navigating in novel indoor environments. Given one or more previous visual observations, Pathdreamer generates plausible high-resolution 360 visual observations (RGB, semantic segmentation and depth) for viewpoints that have not been visited, in buildings not seen during training. In regions of high uncertainty (e.g. predicting around corners, imagining the contents of an unseen room), Pathdreamer can predict diverse scenes, allowing an agent to sample multiple realistic outcomes for a given trajectory. We demonstrate that Pathdreamer encodes useful and accessible visual, spatial and semantic knowledge about human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
