Improving Zero-Shot ObjectNav with Generative Communication
Vishnu Sashank Dorbala, Vishnu Dutt Sharma, Pratap Tokekar, Dinesh, Manocha

TL;DR
This paper introduces a generative communication approach between agents with different views to improve zero-shot ObjectNav, analyzing hallucinations and cooperation effects, with real-world validation.
Contribution
It presents a novel generative communication framework for embodied agents with vision-language models, addressing hallucinations and cooperation in zero-shot ObjectNav.
Findings
Selective assistance improves navigation success rate and efficiency.
Hallucinations correlate strongly with navigation performance.
Prompt finetuning reduces hallucinations and enhances real-world ObjectNav.
Abstract
We propose a new method for improving zero-shot ObjectNav that aims to utilize potentially available environmental percepts for navigational assistance. Our approach takes into account that the ground agent may have limited and sometimes obstructed view. Our formulation encourages Generative Communication (GC) between an assistive overhead agent with a global view containing the target object and the ground agent with an obfuscated view; both equipped with Vision-Language Models (VLMs) for vision-to-language translation. In this assisted setup, the embodied agents communicate environmental information before the ground agent executes actions towards a target. Despite the overhead agent having a global view with the target, we note a drop in performance (-13% in OSR and -13% in SPL) of a fully cooperative assistance scheme over an unassisted baseline. In contrast, a selective assistance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Robot Interaction and HRI · Reinforcement Learning in Robotics · Ethics and Social Impacts of AI
MethodsSemi-Pseudo-Label
