Explore until Confident: Efficient Exploration for Embodied Question   Answering

Allen Z. Ren; Jaden Clark; Anushri Dixit; Masha Itkina; Anirudha; Majumdar; Dorsa Sadigh

arXiv:2403.15941·cs.RO·July 9, 2024·1 cites

Explore until Confident: Efficient Exploration for Embodied Question Answering

Allen Z. Ren, Jaden Clark, Anushri Dixit, Masha Itkina, Anirudha, Majumdar, Dorsa Sadigh

PDF

Open Access

TL;DR

This paper introduces a novel method for embodied question answering that combines semantic mapping, large vision-language models, and confidence calibration to enable robots to explore environments efficiently and answer questions accurately.

Contribution

It proposes a new framework that uses semantic mapping and conformal prediction to improve exploration efficiency and answer confidence calibration in embodied question answering tasks.

Findings

01

Enhanced exploration efficiency over baseline methods

02

Improved answer accuracy with calibrated confidence

03

Validated in both simulated and real robot experiments

Abstract

We consider the problem of Embodied Question Answering (EQA), which refers to settings where an embodied agent such as a robot needs to actively explore an environment to gather information until it is confident about the answer to a question. In this work, we leverage the strong semantic reasoning capabilities of large vision-language models (VLMs) to efficiently explore and answer such questions. However, there are two main challenges when using VLMs in EQA: they do not have an internal memory for mapping the scene to be able to plan how to explore over time, and their confidence can be miscalibrated and can cause the robot to prematurely stop exploration or over-explore. We propose a method that first builds a semantic map of the scene based on depth information and via visual prompting of a VLM - leveraging its vast knowledge of relevant regions of the scene for exploration. Next,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques