Offline Visual Representation Learning for Embodied Navigation

Karmesh Yadav; Ram Ramrakhya; Arjun Majumdar; Vincent-Pierre Berges,; Sachit Kuhar; Dhruv Batra; Alexei Baevski; Oleksandr Maksymets

arXiv:2204.13226·cs.CV·April 29, 2022·24 cites

Offline Visual Representation Learning for Embodied Navigation

Karmesh Yadav, Ram Ramrakhya, Arjun Majumdar, Vincent-Pierre Berges,, Sachit Kuhar, Dhruv Batra, Alexei Baevski, Oleksandr Maksymets

PDF

Open Access 2 Repos

TL;DR

This paper proposes a two-stage offline pretraining and online finetuning approach for visual representations in embodied navigation, significantly improving performance across multiple datasets and tasks.

Contribution

The paper introduces Offline Visual Representation Learning (OVRL), a novel two-stage method combining self-supervised pretraining with online finetuning for embodied agents.

Findings

01

Pretraining with SSL improves navigation success rates substantially.

02

The method generalizes well to unseen datasets and tasks.

03

Performance gains increase with longer training schedules.

Abstract

How should we learn visual representations for embodied agents that must see and move? The status quo is tabula rasa in vivo, i.e. learning visual representations from scratch while also learning to move, potentially augmented with auxiliary tasks (e.g. predicting the action taken between two successive observations). In this paper, we show that an alternative 2-stage strategy is far more effective: (1) offline pretraining of visual representations with self-supervised learning (SSL) using large-scale pre-rendered images of indoor environments (Omnidata), and (2) online finetuning of visuomotor representations on specific tasks with image augmentations under long learning schedules. We call this method Offline Visual Representation Learning (OVRL). We conduct large-scale experiments - on 3 different 3D datasets (Gibson, HM3D, MP3D), 2 tasks (ImageNav, ObjectNav), and 2 policy learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition