NoisyEQA: Benchmarking Embodied Question Answering Against Noisy Queries

Tao Wu; Chuhao Zhou; Yen Heng Wong; Lin Gu; Jianfei Yang

arXiv:2412.10726·cs.CV·December 17, 2024

NoisyEQA: Benchmarking Embodied Question Answering Against Noisy Queries

Tao Wu, Chuhao Zhou, Yen Heng Wong, Lin Gu, Jianfei Yang

PDF

Open Access

TL;DR

This paper introduces NoisyEQA, a benchmark for evaluating embodied question answering systems' ability to handle noisy queries, along with a self-correction mechanism to improve answer accuracy.

Contribution

It presents a new noisy question benchmark, a self-correction prompting method, and an evaluation metric for better assessment of EQA systems under real-world noisy conditions.

Findings

01

Current EQA agents struggle with noise detection.

02

Self-correction prompts improve answer accuracy.

03

Benchmark reveals challenges in real-world noisy scenarios.

Abstract

The rapid advancement of Vision-Language Models (VLMs) has significantly advanced the development of Embodied Question Answering (EQA), enhancing agents' abilities in language understanding and reasoning within complex and realistic scenarios. However, EQA in real-world scenarios remains challenging, as human-posed questions often contain noise that can interfere with an agent's exploration and response, bringing challenges especially for language beginners and non-expert users. To address this, we introduce a NoisyEQA benchmark designed to evaluate an agent's ability to recognize and correct noisy questions. This benchmark introduces four common types of noise found in real-world applications: Latent Hallucination Noise, Memory Noise, Perception Noise, and Semantic Noise generated through an automated dataset creation framework. Additionally, we also propose a 'Self-Correction'…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems