Augmenting Imitation Experience via Equivariant Representations
Dhruv Sharma, Alihusein Kuwajerwala, Florian Shkurti

TL;DR
This paper introduces a novel data augmentation method for visual navigation that extrapolates viewpoint embeddings using equivariant representations, improving policy robustness and reducing interventions in both simulation and real-world experiments.
Contribution
It proposes a new augmentation technique based on equivariant embeddings that leverages the geometry of visual navigation, enhancing policy training beyond standard augmentation methods.
Findings
Reduced cross-track error in policies trained with the new augmentation.
Fewer interventions needed for navigation policies trained with the proposed method.
Successful real-world navigation over 500 meters using the augmented training approach.
Abstract
The robustness of visual navigation policies trained through imitation often hinges on the augmentation of the training image-action pairs. Traditionally, this has been done by collecting data from multiple cameras, by using standard data augmentations from computer vision, such as adding random noise to each image, or by synthesizing training images. In this paper we show that there is another practical alternative for data augmentation for visual navigation based on extrapolating viewpoint embeddings and actions nearby the ones observed in the training data. Our method makes use of the geometry of the visual navigation problem in 2D and 3D and relies on policies that are functions of equivariant embeddings, as opposed to images. Given an image-action pair from a training navigation dataset, our neural network model predicts the latent representations of images at nearby viewpoints,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning
