Visual Pre-training for Navigation: What Can We Learn from Noise?
Yanwei Wang, Ching-Yun Ko, Pulkit Agrawal

TL;DR
This paper introduces a self-supervised approach for visual navigation that predicts goal crop locations from synthetic noise images, enabling efficient policy learning with minimal real-world data.
Contribution
It demonstrates that training on synthetic noise images can transfer to natural images, providing a new self-supervised method for visual navigation.
Findings
Self-supervised crop prediction transfers from noise to natural images.
Representation learned from noise images enables efficient navigation policy training.
Method reduces data requirements for visual navigation systems.
Abstract
One powerful paradigm in visual navigation is to predict actions from observations directly. Training such an end-to-end system allows representations useful for downstream tasks to emerge automatically. However, the lack of inductive bias makes this system data inefficient. We hypothesize a sufficient representation of the current view and the goal view for a navigation policy can be learned by predicting the location and size of a crop of the current view that corresponds to the goal. We further show that training such random crop prediction in a self-supervised fashion purely on synthetic noise images transfers well to natural home images. The learned representation can then be bootstrapped to learn a navigation policy efficiently with little interaction data. The code is available at https://yanweiw.github.io/noise2ptz
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
