Reflex-Based Open-Vocabulary Navigation without Prior Knowledge Using   Omnidirectional Camera and Multiple Vision-Language Models

Kento Kawaharazuka; Yoshiki Obinata; Naoaki Kanazawa; Naoto Tsukamoto,; Kei Okada; Masayuki Inaba

arXiv:2408.11380·cs.RO·August 22, 2024

Reflex-Based Open-Vocabulary Navigation without Prior Knowledge Using Omnidirectional Camera and Multiple Vision-Language Models

Kento Kawaharazuka, Yoshiki Obinata, Naoaki Kanazawa, Naoto Tsukamoto,, Kei Okada, Masayuki Inaba

PDF

TL;DR

This paper presents a simple, map-free robot navigation method using an omnidirectional camera and pre-trained vision-language models, enabling open-vocabulary navigation without prior knowledge or learning.

Contribution

The study introduces a novel approach combining omnidirectional vision and multiple pre-trained vision-language models for map-free, prior-knowledge-free robot navigation.

Findings

01

Navigation achieved without prior maps or learning.

02

Omnidirectional camera simplifies environment perception.

03

Method demonstrates effective open-vocabulary navigation.

Abstract

Various robot navigation methods have been developed, but they are mainly based on Simultaneous Localization and Mapping (SLAM), reinforcement learning, etc., which require prior map construction or learning. In this study, we consider the simplest method that does not require any map construction or learning, and execute open-vocabulary navigation of robots without any prior knowledge to do this. We applied an omnidirectional camera and pre-trained vision-language models to the robot. The omnidirectional camera provides a uniform view of the surroundings, thus eliminating the need for complicated exploratory behaviors including trajectory generation. By applying multiple pre-trained vision-language models to this omnidirectional image and incorporating reflective behaviors, we show that navigation becomes simple and does not require any prior setup. Interesting properties and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.