OVExp: Open Vocabulary Exploration for Object-Oriented Navigation

Meng Wei; Tai Wang; Yilun Chen; Hanqing Wang; Jiangmiao Pang; Xihui; Liu

arXiv:2407.09016·cs.RO·July 15, 2024·1 cites

OVExp: Open Vocabulary Exploration for Object-Oriented Navigation

Meng Wei, Tai Wang, Yilun Chen, Hanqing Wang, Jiangmiao Pang, Xihui, Liu

PDF

Open Access

TL;DR

OVExp is a novel framework that leverages Vision-Language Models for open-vocabulary object navigation, enabling efficient, generalizable goal exploration without extensive training data.

Contribution

It introduces a learning-based approach that constructs scene representations with VLMs and maps goals into the same embedding space, reducing computational costs and improving generalization.

Findings

01

Outperforms previous zero-shot methods on benchmarks.

02

Generalizes well across diverse scenes.

03

Handles various goal modalities effectively.

Abstract

Object-oriented embodied navigation aims to locate specific objects, defined by category or depicted in images. Existing methods often struggle to generalize to open vocabulary goals without extensive training data. While recent advances in Vision-Language Models (VLMs) offer a promising solution by extending object recognition beyond predefined categories, efficient goal-oriented exploration becomes more challenging in an open vocabulary setting. We introduce OVExp, a learning-based framework that integrates VLMs for Open-Vocabulary Exploration. OVExp constructs scene representations by encoding observations with VLMs and projecting them onto top-down maps for goal-conditioned exploration. Goals are encoded in the same VLM feature space, and a lightweight transformer-based decoder predicts target locations while maintaining versatile representation abilities. To address the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Speech and dialogue systems