BEVNav: Robot Autonomous Navigation Via Spatial-Temporal Contrastive   Learning in Bird's-Eye View

Jiahao Jiang; Yuxiang Yang; Yingqi Deng; Chenlong Ma; Jing Zhang

arXiv:2409.01646·cs.RO·September 4, 2024

BEVNav: Robot Autonomous Navigation Via Spatial-Temporal Contrastive Learning in Bird's-Eye View

Jiahao Jiang, Yuxiang Yang, Yingqi Deng, Chenlong Ma, Jing Zhang

PDF

Open Access

TL;DR

BEVNav introduces a novel spatial-temporal contrastive learning approach for bird's-eye view representations, significantly improving robot navigation in complex, map-less environments through deep reinforcement learning.

Contribution

It pioneers the integration of spatial-temporal contrastive learning with reinforcement learning for BEV-based robot navigation, enhancing decision-making in dynamic environments.

Findings

01

Outperforms state-of-the-art navigation methods in dense pedestrian scenarios.

02

Demonstrates robustness across multiple benchmark environments.

03

Achieves superior navigation success rates and reliability.

Abstract

Goal-driven mobile robot navigation in map-less environments requires effective state representations for reliable decision-making. Inspired by the favorable properties of Bird's-Eye View (BEV) in point clouds for visual perception, this paper introduces a novel navigation approach named BEVNav. It employs deep reinforcement learning to learn BEV representations and enhance decision-making reliability. First, we propose a self-supervised spatial-temporal contrastive learning approach to learn BEV representations. Spatially, two randomly augmented views from a point cloud predict each other, enhancing spatial features. Temporally, we combine the current observation with consecutive frames' actions to predict future features, establishing the relationship between observation transitions and actions to capture temporal cues. Then, incorporating this spatial-temporal contrastive learning in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning