End-to-End (Instance)-Image Goal Navigation through Correspondence as an   Emergent Phenomenon

Guillaume Bono; Leonid Antsfeld; Boris Chidlovskii; Philippe; Weinzaepfel; Christian Wolf

arXiv:2309.16634·cs.CV·September 29, 2023·1 cites

End-to-End (Instance)-Image Goal Navigation through Correspondence as an Emergent Phenomenon

Guillaume Bono, Leonid Antsfeld, Boris Chidlovskii, Philippe, Weinzaepfel, Christian Wolf

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel approach for image goal navigation that leverages pretext tasks and a dual encoder model to improve visual correspondence understanding, achieving state-of-the-art results in complex, unseen environments.

Contribution

The authors propose a dual encoder with a large-capacity binocular ViT and a two-step pretext training process to enhance visual correspondence and navigation performance.

Findings

01

Significant improvements on ImageNav benchmarks.

02

State-of-the-art performance on Instance-ImageNav with varying camera parameters.

03

Emergence of correspondence solutions from training signals.

Abstract

Most recent work in goal oriented visual navigation resorts to large-scale machine learning in simulated environments. The main challenge lies in learning compact representations generalizable to unseen environments and in learning high-capacity perception modules capable of reasoning on high-dimensional input. The latter is particularly difficult when the goal is not given as a category ("ObjectNav") but as an exemplar image ("ImageNav"), as the perception module needs to learn a comparison strategy requiring to solve an underlying visual correspondence problem. This has been shown to be difficult from reward alone or with standard auxiliary tasks. We address this problem through a sequence of two pretext tasks, which serve as a prior for what we argue is one of the main bottleneck in perception, extremely wide-baseline relative pose estimation and visibility prediction in complex…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

naver/debit
pytorch

Videos

End-to-End (Instance)-Image Goal Navigation through Correspondence as an Emergent Phenomenon· slideslive

Taxonomy

TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques