Improving Zero-Shot ObjectNav with Generative Communication

Vishnu Sashank Dorbala; Vishnu Dutt Sharma; Pratap Tokekar; Dinesh; Manocha

arXiv:2408.01877·cs.RO·October 3, 2024

Improving Zero-Shot ObjectNav with Generative Communication

Vishnu Sashank Dorbala, Vishnu Dutt Sharma, Pratap Tokekar, Dinesh, Manocha

PDF

Open Access

TL;DR

This paper introduces a generative communication approach between agents with different views to improve zero-shot ObjectNav, analyzing hallucinations and cooperation effects, with real-world validation.

Contribution

It presents a novel generative communication framework for embodied agents with vision-language models, addressing hallucinations and cooperation in zero-shot ObjectNav.

Findings

01

Selective assistance improves navigation success rate and efficiency.

02

Hallucinations correlate strongly with navigation performance.

03

Prompt finetuning reduces hallucinations and enhances real-world ObjectNav.

Abstract

We propose a new method for improving zero-shot ObjectNav that aims to utilize potentially available environmental percepts for navigational assistance. Our approach takes into account that the ground agent may have limited and sometimes obstructed view. Our formulation encourages Generative Communication (GC) between an assistive overhead agent with a global view containing the target object and the ground agent with an obfuscated view; both equipped with Vision-Language Models (VLMs) for vision-to-language translation. In this assisted setup, the embodied agents communicate environmental information before the ground agent executes actions towards a target. Despite the overhead agent having a global view with the target, we note a drop in performance (-13% in OSR and -13% in SPL) of a fully cooperative assistance scheme over an unassisted baseline. In contrast, a selective assistance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSocial Robot Interaction and HRI · Reinforcement Learning in Robotics · Ethics and Social Impacts of AI

MethodsSemi-Pseudo-Label