Visual Pre-training for Navigation: What Can We Learn from Noise?

Yanwei Wang; Ching-Yun Ko; Pulkit Agrawal

arXiv:2207.00052·cs.CV·July 28, 2023

Visual Pre-training for Navigation: What Can We Learn from Noise?

Yanwei Wang, Ching-Yun Ko, Pulkit Agrawal

PDF

Open Access 1 Repo

TL;DR

This paper introduces a self-supervised approach for visual navigation that predicts goal crop locations from synthetic noise images, enabling efficient policy learning with minimal real-world data.

Contribution

It demonstrates that training on synthetic noise images can transfer to natural images, providing a new self-supervised method for visual navigation.

Findings

01

Self-supervised crop prediction transfers from noise to natural images.

02

Representation learned from noise images enables efficient navigation policy training.

03

Method reduces data requirements for visual navigation systems.

Abstract

One powerful paradigm in visual navigation is to predict actions from observations directly. Training such an end-to-end system allows representations useful for downstream tasks to emerge automatically. However, the lack of inductive bias makes this system data inefficient. We hypothesize a sufficient representation of the current view and the goal view for a navigation policy can be learned by predicting the location and size of a crop of the current view that corresponds to the goal. We further show that training such random crop prediction in a self-supervised fashion purely on synthetic noise images transfers well to natural home images. The learned representation can then be bootstrapped to learn a navigation policy efficiently with little interaction data. The code is available at https://yanweiw.github.io/noise2ptz

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yanweiw/noise2ptz
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning