VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation

Naoki Yokoyama; Sehoon Ha; Dhruv Batra; Jiuguang Wang; Bernadette; Bucher

arXiv:2312.03275·cs.RO·December 7, 2023·2 cites

VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation

Naoki Yokoyama, Sehoon Ha, Dhruv Batra, Jiuguang Wang, Bernadette, Bucher

PDF

Open Access 1 Repo

TL;DR

VLFM introduces a zero-shot semantic navigation method using vision-language models to identify and reach unseen objects in novel environments, achieving state-of-the-art results in simulated datasets and successful real-world deployment.

Contribution

The paper presents VLFM, a novel approach combining occupancy maps and vision-language models for zero-shot semantic navigation in unseen environments.

Findings

01

Achieves state-of-the-art SPL on Gibson, HM3D, and MP3D datasets.

02

Successfully deployed on real-world robot (Boston Dynamics Spot).

03

Demonstrates effective zero-shot navigation without prior environment knowledge.

Abstract

Understanding how humans leverage semantic knowledge to navigate unfamiliar environments and decide where to explore next is pivotal for developing robots capable of human-like search behaviors. We introduce a zero-shot navigation approach, Vision-Language Frontier Maps (VLFM), which is inspired by human reasoning and designed to navigate towards unseen semantic objects in novel environments. VLFM builds occupancy maps from depth observations to identify frontiers, and leverages RGB observations and a pre-trained vision-language model to generate a language-grounded value map. VLFM then uses this map to identify the most promising frontier to explore for finding an instance of a given target object category. We evaluate VLFM in photo-realistic environments from the Gibson, Habitat-Matterport 3D (HM3D), and Matterport 3D (MP3D) datasets within the Habitat simulator. Remarkably, VLFM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bdaiinstitute/vlfm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Robotics and Sensor-Based Localization