VANP: Learning Where to See for Navigation with Self-Supervised   Vision-Action Pre-Training

Mohammad Nazeri; Junzhe Wang; Amirreza Payandeh; and Xuesu Xiao

arXiv:2403.08109·cs.RO·January 3, 2025·1 cites

VANP: Learning Where to See for Navigation with Self-Supervised Vision-Action Pre-Training

Mohammad Nazeri, Junzhe Wang, Amirreza Payandeh, and Xuesu Xiao

PDF

Open Access 1 Repo

TL;DR

VANP introduces a self-supervised vision-action pre-training method that enables robots to focus on navigation-relevant visual regions, reducing training time and data requirements compared to traditional supervised approaches.

Contribution

This work presents VANP, a novel self-supervised model that learns navigation-specific visual features using mutual information maximization, without relying on large labeled datasets.

Findings

01

VANP achieves comparable navigation performance with half the training time.

02

VANP requires only 0.08% of ImageNet data for training.

03

Features learned by VANP align with human navigation intuition.

Abstract

Humans excel at efficiently navigating through crowds without collision by focusing on specific visual regions relevant to navigation. However, most robotic visual navigation methods rely on deep learning models pre-trained on vision tasks, which prioritize salient objects -- not necessarily relevant to navigation and potentially misleading. Alternative approaches train specialized navigation models from scratch, requiring significant computation. On the other hand, self-supervised learning has revolutionized computer vision and natural language processing, but its application to robotic navigation remains underexplored due to the difficulty of defining effective self-supervision signals. Motivated by these observations, in this work, we propose a Self-Supervised Vision-Action Model for Visual Navigation Pre-Training (VANP). Instead of detecting salient objects that are beneficial for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mhnazeri/vanp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Robotics and Automated Systems

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Layer Normalization · Absolute Position Encodings · Dropout · Softmax · Residual Connection · Dense Connections