Learning Navigational Visual Representations with Semantic Map Supervision
Yicong Hong, Yang Zhou, Ruiyi Zhang, Franck Dernoncourt, Trung Bui,, Stephen Gould, Hao Tan

TL;DR
This paper introduces Ego$^2$-Map, a novel visual representation learning method that leverages semantic maps to enhance indoor robot navigation, outperforming existing pre-training approaches and setting new state-of-the-art results.
Contribution
It proposes a navigation-specific visual learning approach using semantic map supervision with a visual transformer, improving navigation performance in indoor environments.
Findings
Outperforms recent visual pre-training methods in object-goal navigation.
Achieves 47% success rate and 41% SPL on vision-and-language navigation tasks.
Enhances egocentric visual representations with semantic and spatial information.
Abstract
Being able to perceive the semantics and the spatial structure of the environment is essential for visual navigation of a household robot. However, most existing works only employ visual backbones pre-trained either with independent images for classification or with self-supervised learning methods to adapt to the indoor navigation domain, neglecting the spatial relationships that are essential to the learning of navigation. Inspired by the behavior that humans naturally build semantically and spatially meaningful cognitive maps in their brains during navigation, in this paper, we propose a novel navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps (Ego-Map). We apply the visual transformer as the backbone encoder and train the model with data collected from the large-scale Habitat-Matterport3D environments.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsSemi-Pseudo-Label
