TL;DR
This paper introduces a novel cross-modal variational autoencoder architecture that learns robust visuomotor policies for drone navigation, trained solely on simulated data and successfully transferred to real-world scenarios.
Contribution
The work presents a new cross-modal architecture combining supervised and unsupervised data, enabling simulation-trained policies to generalize effectively to real-world drone navigation tasks.
Findings
Significantly improved control performance over end-to-end methods
Successful real-world drone navigation through gates in various conditions
Effective transfer of policies from simulation to real environment
Abstract
Machines are a long way from robustly solving open-world perception-control tasks, such as first-person view (FPV) aerial navigation. While recent advances in end-to-end Machine Learning, especially Imitation and Reinforcement Learning appear promising, they are constrained by the need of large amounts of difficult-to-collect labeled real-world data. Simulated data, on the other hand, is easy to generate, but generally does not render safe behaviors in diverse real-life scenarios. In this work we propose a novel method for learning robust visuomotor policies for real-world deployment which can be trained purely with simulated data. We develop rich state representations that combine supervised and unsupervised environment data. Our approach takes a cross-modal perspective, where separate modalities correspond to the raw camera data and the system states relevant to the task, such as the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Learning Visuomotor Policies for Aerial Navigation Using Cross-Modal Representations· youtube
