Active Visual Information Gathering for Vision-Language Navigation

Hanqing Wang; Wenguan Wang; Tianmin Shu; Wei Liang; Jianbing Shen

arXiv:2007.08037·cs.CV·August 21, 2020·5 cites

Active Visual Information Gathering for Vision-Language Navigation

Hanqing Wang, Wenguan Wang, Tianmin Shu, Wei Liang, Jianbing Shen

PDF

Open Access 1 Repo

TL;DR

This paper introduces an active exploration framework for vision-language navigation, enabling agents to intelligently gather environmental information to improve navigation accuracy in photo-realistic settings.

Contribution

It presents an end-to-end learning approach for an exploration policy that determines when, where, and what to explore, enhancing navigation robustness.

Findings

01

Significant improvement in navigation performance on R2R benchmark.

02

Emergence of effective exploration strategies during training.

03

Enhanced results across all VLN settings, including single run, pre-exploration, and beam search.

Abstract

Vision-language navigation (VLN) is the task of entailing an agent to carry out navigational instructions inside photo-realistic environments. One of the key challenges in VLN is how to conduct a robust navigation by mitigating the uncertainty caused by ambiguous instructions and insufficient observation of the environment. Agents trained by current approaches typically suffer from this and would consequently struggle to avoid random and inefficient actions at every step. In contrast, when humans face such a challenge, they can still maintain robust navigation by actively exploring the surroundings to gather more information and thus make more confident navigation decisions. This work draws inspiration from human navigation behavior and endows an agent with an active information gathering ability for a more intelligent vision-language navigation policy. To achieve this, we propose an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HanqingWangAI/Active_VLN
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning