One Map to Find Them All: Real-time Open-Vocabulary Mapping for Zero-shot Multi-Object Navigation
Finn Lukas Busch, Timon Homberger, Jes\'us Ortega-Peimbert, Quantao, Yang, Olov Andersson

TL;DR
This paper introduces a real-time open-vocabulary mapping approach for zero-shot multi-object navigation, enabling robots to efficiently search for multiple objects by leveraging previous knowledge and semantic uncertainty in complex environments.
Contribution
It presents a new benchmark and a reusable semantic map with probabilistic updates that improve multi-object navigation by utilizing information from prior searches.
Findings
Outperforms state-of-the-art methods in simulation and real-world tests.
Effective in both single and multi-object navigation tasks.
Operates in real-time on a Jetson Orin AGX.
Abstract
The capability to efficiently search for objects in complex environments is fundamental for many real-world robot applications. Recent advances in open-vocabulary vision models have resulted in semantically-informed object navigation methods that allow a robot to search for an arbitrary object without prior training. However, these zero-shot methods have so far treated the environment as unknown for each consecutive query. In this paper we introduce a new benchmark for zero-shot multi-object navigation, allowing the robot to leverage information gathered from previous searches to more efficiently find new objects. To address this problem we build a reusable open-vocabulary feature map tailored for real-time object search. We further propose a probabilistic-semantic map update that mitigates common sources of errors in semantic feature extraction and leverage this semantic uncertainty…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Semantic Web and Ontologies
MethodsSparse Evolutionary Training
