Learning Visuomotor Policies for Aerial Navigation Using Cross-Modal   Representations

Rogerio Bonatti; Ratnesh Madaan; Vibhav Vineet; Sebastian; Scherer; Ashish Kapoor

arXiv:1909.06993·cs.CV·March 10, 2020

Learning Visuomotor Policies for Aerial Navigation Using Cross-Modal Representations

Rogerio Bonatti, Ratnesh Madaan, Vibhav Vineet, Sebastian, Scherer, Ashish Kapoor

PDF

2 Repos 1 Video

TL;DR

This paper introduces a novel cross-modal variational autoencoder architecture that learns robust visuomotor policies for drone navigation, trained solely on simulated data and successfully transferred to real-world scenarios.

Contribution

The work presents a new cross-modal architecture combining supervised and unsupervised data, enabling simulation-trained policies to generalize effectively to real-world drone navigation tasks.

Findings

01

Significantly improved control performance over end-to-end methods

02

Successful real-world drone navigation through gates in various conditions

03

Effective transfer of policies from simulation to real environment

Abstract

Machines are a long way from robustly solving open-world perception-control tasks, such as first-person view (FPV) aerial navigation. While recent advances in end-to-end Machine Learning, especially Imitation and Reinforcement Learning appear promising, they are constrained by the need of large amounts of difficult-to-collect labeled real-world data. Simulated data, on the other hand, is easy to generate, but generally does not render safe behaviors in diverse real-life scenarios. In this work we propose a novel method for learning robust visuomotor policies for real-world deployment which can be trained purely with simulated data. We develop rich state representations that combine supervised and unsupervised environment data. Our approach takes a cross-modal perspective, where separate modalities correspond to the raw camera data and the system states relevant to the task, such as the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Learning Visuomotor Policies for Aerial Navigation Using Cross-Modal Representations· youtube