Reliable Semantic Understanding for Real World Zero-shot Object Goal Navigation
Halil Utku Unlu, Shuaihang Yuan, Congcong Wen, Hao Huang, Anthony Tzes, and Yi Fang

TL;DR
This paper presents a novel dual-model framework combining GLIP and InstructionBLIP to improve semantic understanding in zero-shot object goal navigation, significantly enhancing robot navigation in unfamiliar environments.
Contribution
The paper introduces a new dual-component approach that integrates vision-language models for better semantic recognition and validation in zero-shot navigation tasks.
Findings
Improved navigation accuracy in simulated environments.
Enhanced reliability of semantic recognition in real-world tests.
Significant performance gains over traditional methods.
Abstract
We introduce an innovative approach to advancing semantic understanding in zero-shot object goal navigation (ZS-OGN), enhancing the autonomy of robots in unfamiliar environments. Traditional reliance on labeled data has been a limitation for robotic adaptability, which we address by employing a dual-component framework that integrates a GLIP Vision Language Model for initial detection and an InstructionBLIP model for validation. This combination not only refines object and environmental recognition but also fortifies the semantic interpretation, pivotal for navigational decision-making. Our method, rigorously tested in both simulated and real-world settings, exhibits marked improvements in navigation precision and reliability.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI-based Problem Solving and Planning · Robotic Path Planning Algorithms · Target Tracking and Data Fusion in Sensor Networks
