Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences
Hongyuan Mei, Mohit Bansal, Matthew R. Walter

TL;DR
This paper introduces a neural sequence-to-sequence model with multi-level attention for translating natural language navigational instructions into action sequences, achieving state-of-the-art results without relying on linguistic resources.
Contribution
It presents a novel multi-level aligner within an LSTM-based encoder-decoder framework that improves instruction understanding for navigation tasks without specialized linguistic tools.
Findings
Achieves best results on a benchmark dataset for single-sentence instructions.
Demonstrates competitive performance in multi-sentence instruction settings.
Provides ablation studies highlighting the importance of model components.
Abstract
We propose a neural sequence-to-sequence model for direction following, a task that is essential to realizing effective autonomous agents. Our alignment-based encoder-decoder model with long short-term memory recurrent neural networks (LSTM-RNN) translates natural language instructions to action sequences based upon a representation of the observable world state. We introduce a multi-level aligner that empowers our model to focus on sentence "regions" salient to the current world state by using multiple abstractions of the input sentence. In contrast to existing methods, our model uses no specialized linguistic resources (e.g., parsers) or task-specific annotations (e.g., seed lexicons). It is therefore generalizable, yet still achieves the best results reported to-date on a benchmark single-sentence dataset and competitive results for the limited-training multi-sentence setting. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory
