NoisyEQA: Benchmarking Embodied Question Answering Against Noisy Queries
Tao Wu, Chuhao Zhou, Yen Heng Wong, Lin Gu, Jianfei Yang

TL;DR
This paper introduces NoisyEQA, a benchmark for evaluating embodied question answering systems' ability to handle noisy queries, along with a self-correction mechanism to improve answer accuracy.
Contribution
It presents a new noisy question benchmark, a self-correction prompting method, and an evaluation metric for better assessment of EQA systems under real-world noisy conditions.
Findings
Current EQA agents struggle with noise detection.
Self-correction prompts improve answer accuracy.
Benchmark reveals challenges in real-world noisy scenarios.
Abstract
The rapid advancement of Vision-Language Models (VLMs) has significantly advanced the development of Embodied Question Answering (EQA), enhancing agents' abilities in language understanding and reasoning within complex and realistic scenarios. However, EQA in real-world scenarios remains challenging, as human-posed questions often contain noise that can interfere with an agent's exploration and response, bringing challenges especially for language beginners and non-expert users. To address this, we introduce a NoisyEQA benchmark designed to evaluate an agent's ability to recognize and correct noisy questions. This benchmark introduces four common types of noise found in real-world applications: Latent Hallucination Noise, Memory Noise, Perception Noise, and Semantic Noise generated through an automated dataset creation framework. Additionally, we also propose a 'Self-Correction'…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
