Reliable Visual Question Answering: Abstain Rather Than Answer   Incorrectly

Spencer Whitehead; Suzanne Petryk; Vedaad Shakib; Joseph Gonzalez,; Trevor Darrell; Anna Rohrbach; Marcus Rohrbach

arXiv:2204.13631·cs.CV·October 21, 2022·1 cites

Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly

Spencer Whitehead, Suzanne Petryk, Vedaad Shakib, Joseph Gonzalez,, Trevor Darrell, Anna Rohrbach, Marcus Rohrbach

PDF

Open Access 1 Repo

TL;DR

This paper introduces a framework for reliable visual question answering (VQA) that emphasizes abstaining from answering when uncertain, proposes new metrics and methods to improve coverage while maintaining low error rates, and promotes self-aware models in multimodal AI.

Contribution

It formulates a new reliable VQA problem emphasizing abstention, proposes a multimodal selection function, and introduces an Effective Reliability metric to better evaluate model performance.

Findings

01

Models with abstention can answer less than 7.5% of questions at 1% error risk.

02

Using a multimodal selection function increases coverage from 6.8% to 15.6%.

03

The proposed metric emphasizes the importance of abstaining to improve reliability.

Abstract

Machine learning has advanced dramatically, narrowing the accuracy gap to humans in multimodal tasks like visual question answering (VQA). However, while humans can say "I don't know" when they are uncertain (i.e., abstain from answering a question), such ability has been largely neglected in multimodal research, despite the importance of this problem to the usage of VQA in real settings. In this work, we promote a problem formulation for reliable VQA, where we prefer abstention over providing an incorrect answer. We first enable abstention capabilities for several VQA models, and analyze both their coverage, the portion of questions answered, and risk, the error on that portion. For that, we explore several abstention approaches. We find that although the best performing models achieve over 70% accuracy on the VQA v2 dataset, introducing the option to abstain by directly using a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/reliable_vqa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsSoftmax