'Just because you are right, doesn't mean I am wrong': Overcoming a   Bottleneck in the Development and Evaluation of Open-Ended Visual Question   Answering (VQA) Tasks

Man Luo; Shailaja Keyur Sampat; Riley Tallman; Yankai Zeng; Manuha; Vancha; Akarshan Sajja; Chitta Baral

arXiv:2103.15022·cs.CL·June 2, 2022

'Just because you are right, doesn't mean I am wrong': Overcoming a Bottleneck in the Development and Evaluation of Open-Ended Visual Question Answering (VQA) Tasks

Man Luo, Shailaja Keyur Sampat, Riley Tallman, Yankai Zeng, Manuha, Vancha, Akarshan Sajja, Chitta Baral

PDF

Open Access 1 Repo

TL;DR

This paper introduces Alternative Answer Sets (AAS) to address the limitation of single ground-truth answers in VQA datasets, improving model evaluation by recognizing multiple plausible answers.

Contribution

It proposes an automatic method to generate AAS using NLP tools and modifies VQA models to support multiple answers, enhancing evaluation accuracy.

Findings

01

Performance improved on GQA dataset with AAS support

02

Semantic metric based on AAS better captures answer correctness

03

Supports multiple plausible answers for more realistic evaluation

Abstract

GQA~\citep{hudson2019gqa} is a dataset for real-world visual reasoning and compositional question answering. We found that many answers predicted by the best vision-language models on the GQA dataset do not match the ground-truth answer but still are semantically meaningful and correct in the given context. In fact, this is the case with most existing visual question answering (VQA) datasets where they assume only one ground-truth answer for each question. We propose Alternative Answer Sets (AAS) of ground-truth answers to address this limitation, which is created automatically using off-the-shelf NLP tools. We introduce a semantic metric based on AAS and modify top VQA solvers to support multiple plausible answers for a question. We implement this approach on the GQA dataset and show the performance improvements. Code and data are available in this link…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

luomancs/alternative_answer_set
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques