Improving Selective Visual Question Answering by Learning from Your   Peers

Corentin Dancette; Spencer Whitehead; Rishabh Maheshwary; Ramakrishna; Vedantam; Stefan Scherer; Xinlei Chen; Matthieu Cord; Marcus Rohrbach

arXiv:2306.08751·cs.CV·June 16, 2023·1 cites

Improving Selective Visual Question Answering by Learning from Your Peers

Corentin Dancette, Spencer Whitehead, Rishabh Maheshwary, Ramakrishna, Vedantam, Stefan Scherer, Xinlei Chen, Matthieu Cord, Marcus Rohrbach

PDF

Open Access 1 Repo

TL;DR

This paper introduces a simple learning method called Learning from Your Peers (LYP) to improve selective answering in Visual Question Answering models, especially under in-distribution and out-of-distribution scenarios, by effectively abstaining from uncertain questions.

Contribution

The paper proposes the LYP approach for training multimodal selection functions that enhance abstention decisions without extra manual labels or data, significantly improving selective prediction performance.

Findings

01

Doubles previous best coverage in ID scenarios to 32.92%.

02

Improves abstention in mixed ID/OOD scenarios to 25.38%.

03

Outperforms softmax confidence-based abstention methods.

Abstract

Despite advances in Visual Question Answering (VQA), the ability of models to assess their own correctness remains underexplored. Recent work has shown that VQA models, out-of-the-box, can have difficulties abstaining from answering when they are wrong. The option to abstain, also called Selective Prediction, is highly relevant when deploying systems to users who must trust the system's output (e.g., VQA assistants for users with visual impairments). For such scenarios, abstention can be especially important as users may provide out-of-distribution (OOD) or adversarial inputs that make incorrect answers more likely. In this work, we explore Selective VQA in both in-distribution (ID) and OOD scenarios, where models are presented with mixtures of ID and OOD data. The goal is to maximize the number of questions answered while minimizing the risk of error on those questions. We propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/selective-vqa_ood
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsSoftmax