Talk2Nav: Long-Range Vision-and-Language Navigation with Dual Attention and Spatial Memory
Arun Balajee Vasudevan, Dengxin Dai, Luc Van Gool

TL;DR
This paper introduces Talk2Nav, a large-scale dataset and a dual attention-based neural navigation model that enables robots to follow verbal instructions for long-range navigation using spatial memory.
Contribution
The paper presents a new dataset with verbal navigation instructions and a novel dual attention neural network that improves long-range vision-and-language navigation performance.
Findings
Our model significantly outperforms previous methods.
The dataset contains over 10,700 routes with verbal instructions.
The dual attention mechanism effectively extracts relevant spatial information.
Abstract
The role of robots in society keeps expanding, bringing with it the necessity of interacting and communicating with humans. In order to keep such interaction intuitive, we provide automatic wayfinding based on verbal navigational instructions. Our first contribution is the creation of a large-scale dataset with verbal navigation instructions. To this end, we have developed an interactive visual navigation environment based on Google Street View; we further design an annotation method to highlight mined anchor landmarks and local directions between them in order to help annotators formulate typical, human references to those. The annotation task was crowdsourced on the AMT platform, to construct a new Talk2Nav dataset with routes. Our second contribution is a new learning method. Inspired by spatial cognition research on the mental conceptualization of navigational instructions,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
