Is the House Ready For Sleeptime? Generating and Evaluating Situational   Queries for Embodied Question Answering

Vishnu Sashank Dorbala; Prasoon Goyal; Robinson Piramuthu; Michael; Johnston; Reza Ghanadhan; Dinesh Manocha

arXiv:2405.04732·cs.RO·March 12, 2025

Is the House Ready For Sleeptime? Generating and Evaluating Situational Queries for Embodied Question Answering

Vishnu Sashank Dorbala, Prasoon Goyal, Robinson Piramuthu, Michael, Johnston, Reza Ghanadhan, Dinesh Manocha

PDF

Open Access

TL;DR

This paper introduces a novel approach for generating and evaluating complex situational queries in embodied question answering within household environments, highlighting the strengths and limitations of large language models in this context.

Contribution

It presents the first method for generating situational queries for EQA using a Prompt-Generate-Evaluate scheme and evaluates LLM performance in both virtual and real-world settings.

Findings

01

LLMs can generate high-answerability situational data (97.26%).

02

LLMs show low correlation (46.2%) with human consensus in answering.

03

LLMs often contradict commonsense reasoning when justifying answers.

Abstract

We present and tackle the problem of Embodied Question Answering (EQA) with Situational Queries (S-EQA) in a household environment. Unlike prior EQA work tackling simple queries that directly reference target objects and properties ("What is the color of the car?"), situational queries (such as "Is the house ready for sleeptime?") are challenging as they require the agent to correctly identify multiple object-states (Doors: Closed, Lights: Off, etc.) and reach a consensus on their states for an answer. Towards this objective, we first introduce a novel Prompt-Generate-Evaluate (PGE) scheme that wraps around an LLM's output to generate unique situational queries and corresponding consensus object information. PGE is used to generate 2K datapoints in the VirtualHome simulator, which is then annotated for ground truth answers via a large scale user-study conducted on M-Turk. With a high…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems · Semantic Web and Ontologies

MethodsSparse Evolutionary Training