Learning Navigational Visual Representations with Semantic Map   Supervision

Yicong Hong; Yang Zhou; Ruiyi Zhang; Franck Dernoncourt; Trung Bui,; Stephen Gould; Hao Tan

arXiv:2307.12335·cs.CV·July 25, 2023·1 cites

Learning Navigational Visual Representations with Semantic Map Supervision

Yicong Hong, Yang Zhou, Ruiyi Zhang, Franck Dernoncourt, Trung Bui,, Stephen Gould, Hao Tan

PDF

Open Access 1 Repo

TL;DR

This paper introduces Ego$^2$-Map, a novel visual representation learning method that leverages semantic maps to enhance indoor robot navigation, outperforming existing pre-training approaches and setting new state-of-the-art results.

Contribution

It proposes a navigation-specific visual learning approach using semantic map supervision with a visual transformer, improving navigation performance in indoor environments.

Findings

01

Outperforms recent visual pre-training methods in object-goal navigation.

02

Achieves 47% success rate and 41% SPL on vision-and-language navigation tasks.

03

Enhances egocentric visual representations with semantic and spatial information.

Abstract

Being able to perceive the semantics and the spatial structure of the environment is essential for visual navigation of a household robot. However, most existing works only employ visual backbones pre-trained either with independent images for classification or with self-supervised learning methods to adapt to the indoor navigation domain, neglecting the spatial relationships that are essential to the learning of navigation. Inspired by the behavior that humans naturally build semantically and spatially meaningful cognitive maps in their brains during navigation, in this paper, we propose a novel navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps (Ego $^{2}$ -Map). We apply the visual transformer as the backbone encoder and train the model with data collected from the large-scale Habitat-Matterport3D environments.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yiconghong/ego2map-navit
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsSemi-Pseudo-Label