TL;DR
This paper presents GSMN, a neural network that maps images and instructions to drone control commands using explicit semantic mapping, trained with DAggerFM, and demonstrates improved performance and interpretability in simulated environments.
Contribution
The paper introduces GSMN, a novel neural architecture that explicitly constructs semantic maps for instruction following in quadcopters, enhancing performance and interpretability.
Findings
GSMN outperforms strong neural baselines in simulation.
Explicit mapping improves instruction-following accuracy.
Learned maps are interpretable and grounded in the environment.
Abstract
We introduce a method for following high-level navigation instructions by mapping directly from images, instructions and pose estimates to continuous low-level velocity commands for real-time control. The Grounded Semantic Mapping Network (GSMN) is a fully-differentiable neural network architecture that builds an explicit semantic map in the world reference frame by incorporating a pinhole camera projection model within the network. The information stored in the map is learned from experience, while the local-to-world transformation is computed explicitly. We train the model using DAggerFM, a modified variant of DAgger that trades tabular convergence guarantees for improved training speed and memory use. We test GSMN in virtual environments on a realistic quadcopter simulator and show that incorporating an explicit mapping and grounding modules allows GSMN to outperform strong neural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
