Following High-level Navigation Instructions on a Simulated Quadcopter   with Imitation Learning

Valts Blukis; Nataly Brukhim; Andrew Bennett; Ross A. Knepper; and Yoav Artzi

arXiv:1806.00047·cs.AI·June 4, 2018

Following High-level Navigation Instructions on a Simulated Quadcopter with Imitation Learning

Valts Blukis, Nataly Brukhim, Andrew Bennett, Ross A. Knepper, and Yoav Artzi

PDF

1 Repo

TL;DR

This paper presents GSMN, a neural network that maps images and instructions to drone control commands using explicit semantic mapping, trained with DAggerFM, and demonstrates improved performance and interpretability in simulated environments.

Contribution

The paper introduces GSMN, a novel neural architecture that explicitly constructs semantic maps for instruction following in quadcopters, enhancing performance and interpretability.

Findings

01

GSMN outperforms strong neural baselines in simulation.

02

Explicit mapping improves instruction-following accuracy.

03

Learned maps are interpretable and grounded in the environment.

Abstract

We introduce a method for following high-level navigation instructions by mapping directly from images, instructions and pose estimates to continuous low-level velocity commands for real-time control. The Grounded Semantic Mapping Network (GSMN) is a fully-differentiable neural network architecture that builds an explicit semantic map in the world reference frame by incorporating a pinhole camera projection model within the network. The information stored in the map is learned from experience, while the local-to-world transformation is computed explicitly. We train the model using DAggerFM, a modified variant of DAgger that trades tabular convergence guarantees for improved training speed and memory use. We test GSMN in virtual environments on a realistic quadcopter simulator and show that incorporating an explicit mapping and grounding modules allows GSMN to outperform strong neural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lil-lab/drif
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings