Talk2Nav: Long-Range Vision-and-Language Navigation with Dual Attention   and Spatial Memory

Arun Balajee Vasudevan; Dengxin Dai; Luc Van Gool

arXiv:1910.02029·cs.CV·October 23, 2020

Talk2Nav: Long-Range Vision-and-Language Navigation with Dual Attention and Spatial Memory

Arun Balajee Vasudevan, Dengxin Dai, Luc Van Gool

PDF

TL;DR

This paper introduces Talk2Nav, a large-scale dataset and a dual attention-based neural navigation model that enables robots to follow verbal instructions for long-range navigation using spatial memory.

Contribution

The paper presents a new dataset with verbal navigation instructions and a novel dual attention neural network that improves long-range vision-and-language navigation performance.

Findings

01

Our model significantly outperforms previous methods.

02

The dataset contains over 10,700 routes with verbal instructions.

03

The dual attention mechanism effectively extracts relevant spatial information.

Abstract

The role of robots in society keeps expanding, bringing with it the necessity of interacting and communicating with humans. In order to keep such interaction intuitive, we provide automatic wayfinding based on verbal navigational instructions. Our first contribution is the creation of a large-scale dataset with verbal navigation instructions. To this end, we have developed an interactive visual navigation environment based on Google Street View; we further design an annotation method to highlight mined anchor landmarks and local directions between them in order to help annotators formulate typical, human references to those. The annotation task was crowdsourced on the AMT platform, to construct a new Talk2Nav dataset with $10, 714$ routes. Our second contribution is a new learning method. Inspired by spatial cognition research on the mental conceptualization of navigational instructions,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.