DREAMWALKER: Mental Planning for Continuous Vision-Language Navigation

Hanqing Wang; Wei Liang; Luc Van Gool; Wenguan Wang

arXiv:2308.07498·cs.CV·August 16, 2023·1 cites

DREAMWALKER: Mental Planning for Continuous Vision-Language Navigation

Hanqing Wang, Wei Liang, Luc Van Gool, Wenguan Wang

PDF

Open Access

TL;DR

DREAMWALKER introduces a world model-based approach for vision-language navigation, enabling agents to perform mental planning and simulate future scenarios internally, leading to more strategic and interpretable navigation behaviors.

Contribution

The paper presents a novel world model for VLN-CE that allows for internal simulation of future actions, improving planning and decision transparency.

Findings

01

Outperforms existing model-free VLN-CE agents in navigation tasks.

02

Enables strategic planning through mental experiments.

03

Improves decision transparency and interpretability.

Abstract

VLN-CE is a recently released embodied task, where AI agents need to navigate a freely traversable environment to reach a distant target location, given language instructions. It poses great challenges due to the huge space of possible strategies. Driven by the belief that the ability to anticipate the consequences of future actions is crucial for the emergence of intelligent and interpretable planning behavior, we propose DREAMWALKER -- a world model based VLN-CE agent. The world model is built to summarize the visual, topological, and dynamic properties of the complicated continuous environment into a discrete, structured, and compact representation. DREAMWALKER can simulate and evaluate possible plans entirely in such internal abstract world, before executing costly actions. As opposed to existing model-free VLN-CE agents simply making greedy decisions in the real world, which easily…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling