Blindfold Baselines for Embodied QA

Ankesh Anand; Eugene Belilovsky; Kyle Kastner; Hugo Larochelle; Aaron; Courville

arXiv:1811.05013·cs.CV·November 14, 2018·32 cites

Blindfold Baselines for Embodied QA

Ankesh Anand, Eugene Belilovsky, Kyle Kastner, Hugo Larochelle, Aaron, Courville

PDF

Open Access 1 Repo

TL;DR

This paper investigates question-only baselines for Embodied Question Answering, revealing that such simple approaches can achieve state-of-the-art results, highlighting potential issues in current evaluation methods.

Contribution

It demonstrates that question-only baselines can outperform or match complex navigation-based methods in EmbodiedQA, challenging assumptions about environment understanding.

Findings

01

Question-only baseline achieves state-of-the-art results in EmbodiedQA.

02

Blindfold approach performs well except when near the target object.

03

Highlights potential evaluation issues in EmbodiedQA tasks.

Abstract

We explore blindfold (question-only) baselines for Embodied Question Answering. The EmbodiedQA task requires an agent to answer a question by intelligently navigating in a simulated environment, gathering necessary visual information only through first-person vision before finally answering. Consequently, a blindfold baseline which ignores the environment and visual information is a degenerate solution, yet we show through our experiments on the EQAv1 dataset that a simple question-only baseline achieves state-of-the-art results on the EmbodiedQA task in all cases except when the agent is spawned extremely close to the object.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ankeshanand/blindfold-baselines-eqa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling