DREAMWALKER: Mental Planning for Continuous Vision-Language Navigation
Hanqing Wang, Wei Liang, Luc Van Gool, Wenguan Wang

TL;DR
DREAMWALKER introduces a world model-based approach for vision-language navigation, enabling agents to perform mental planning and simulate future scenarios internally, leading to more strategic and interpretable navigation behaviors.
Contribution
The paper presents a novel world model for VLN-CE that allows for internal simulation of future actions, improving planning and decision transparency.
Findings
Outperforms existing model-free VLN-CE agents in navigation tasks.
Enables strategic planning through mental experiments.
Improves decision transparency and interpretability.
Abstract
VLN-CE is a recently released embodied task, where AI agents need to navigate a freely traversable environment to reach a distant target location, given language instructions. It poses great challenges due to the huge space of possible strategies. Driven by the belief that the ability to anticipate the consequences of future actions is crucial for the emergence of intelligent and interpretable planning behavior, we propose DREAMWALKER -- a world model based VLN-CE agent. The world model is built to summarize the visual, topological, and dynamic properties of the complicated continuous environment into a discrete, structured, and compact representation. DREAMWALKER can simulate and evaluate possible plans entirely in such internal abstract world, before executing costly actions. As opposed to existing model-free VLN-CE agents simply making greedy decisions in the real world, which easily…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
