Is the House Ready For Sleeptime? Generating and Evaluating Situational Queries for Embodied Question Answering
Vishnu Sashank Dorbala, Prasoon Goyal, Robinson Piramuthu, Michael, Johnston, Reza Ghanadhan, Dinesh Manocha

TL;DR
This paper introduces a novel approach for generating and evaluating complex situational queries in embodied question answering within household environments, highlighting the strengths and limitations of large language models in this context.
Contribution
It presents the first method for generating situational queries for EQA using a Prompt-Generate-Evaluate scheme and evaluates LLM performance in both virtual and real-world settings.
Findings
LLMs can generate high-answerability situational data (97.26%).
LLMs show low correlation (46.2%) with human consensus in answering.
LLMs often contradict commonsense reasoning when justifying answers.
Abstract
We present and tackle the problem of Embodied Question Answering (EQA) with Situational Queries (S-EQA) in a household environment. Unlike prior EQA work tackling simple queries that directly reference target objects and properties ("What is the color of the car?"), situational queries (such as "Is the house ready for sleeptime?") are challenging as they require the agent to correctly identify multiple object-states (Doors: Closed, Lights: Off, etc.) and reach a consensus on their states for an answer. Towards this objective, we first introduce a novel Prompt-Generate-Evaluate (PGE) scheme that wraps around an LLM's output to generate unique situational queries and corresponding consensus object information. PGE is used to generate 2K datapoints in the VirtualHome simulator, which is then annotated for ground truth answers via a large scale user-study conducted on M-Turk. With a high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Semantic Web and Ontologies
MethodsSparse Evolutionary Training
