FollowNet: Robot Navigation by Following Natural Language Directions   with Deep Reinforcement Learning

Pararth Shah; Marek Fiser; Aleksandra Faust; J. Chase Kew; and Dilek; Hakkani-Tur

arXiv:1805.06150·cs.RO·September 20, 2018·43 cites

FollowNet: Robot Navigation by Following Natural Language Directions with Deep Reinforcement Learning

Pararth Shah, Marek Fiser, Aleksandra Faust, J. Chase Kew, and Dilek, Hakkani-Tur

PDF

Open Access

TL;DR

FollowNet is a deep reinforcement learning-based neural architecture that enables robots to understand and follow complex natural language directions for navigation in simulated environments, improving success rates over baseline models.

Contribution

This work introduces FollowNet, an end-to-end neural model with attention mechanism for natural language guided robot navigation using multi-modal inputs.

Findings

01

Achieves 52% success rate on unseen instructions

02

Shows 30% improvement over baseline without attention

03

Successfully navigates paths not encountered during training

Abstract

Understanding and following directions provided by humans can enable robots to navigate effectively in unknown situations. We present FollowNet, an end-to-end differentiable neural architecture for learning multi-modal navigation policies. FollowNet maps natural language instructions as well as visual and depth inputs to locomotion primitives. FollowNet processes instructions using an attention mechanism conditioned on its visual and depth input to focus on the relevant parts of the command while performing the navigation task. Deep reinforcement learning (RL) a sparse reward learns simultaneously the state representation, the attention function, and control policies. We evaluate our agent on a dataset of complex natural language directions that guide the agent through a rich and realistic dataset of simulated homes. We show that the FollowNet agent learns to execute previously unseen…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Speech and dialogue systems