Blindfold Baselines for Embodied QA
Ankesh Anand, Eugene Belilovsky, Kyle Kastner, Hugo Larochelle, Aaron, Courville

TL;DR
This paper investigates question-only baselines for Embodied Question Answering, revealing that such simple approaches can achieve state-of-the-art results, highlighting potential issues in current evaluation methods.
Contribution
It demonstrates that question-only baselines can outperform or match complex navigation-based methods in EmbodiedQA, challenging assumptions about environment understanding.
Findings
Question-only baseline achieves state-of-the-art results in EmbodiedQA.
Blindfold approach performs well except when near the target object.
Highlights potential evaluation issues in EmbodiedQA tasks.
Abstract
We explore blindfold (question-only) baselines for Embodied Question Answering. The EmbodiedQA task requires an agent to answer a question by intelligently navigating in a simulated environment, gathering necessary visual information only through first-person vision before finally answering. Consequently, a blindfold baseline which ignores the environment and visual information is a degenerate solution, yet we show through our experiments on the EQAv1 dataset that a simple question-only baseline achieves state-of-the-art results on the EmbodiedQA task in all cases except when the agent is spawned extremely close to the object.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
