Improving Selective Visual Question Answering by Learning from Your Peers
Corentin Dancette, Spencer Whitehead, Rishabh Maheshwary, Ramakrishna, Vedantam, Stefan Scherer, Xinlei Chen, Matthieu Cord, Marcus Rohrbach

TL;DR
This paper introduces a simple learning method called Learning from Your Peers (LYP) to improve selective answering in Visual Question Answering models, especially under in-distribution and out-of-distribution scenarios, by effectively abstaining from uncertain questions.
Contribution
The paper proposes the LYP approach for training multimodal selection functions that enhance abstention decisions without extra manual labels or data, significantly improving selective prediction performance.
Findings
Doubles previous best coverage in ID scenarios to 32.92%.
Improves abstention in mixed ID/OOD scenarios to 25.38%.
Outperforms softmax confidence-based abstention methods.
Abstract
Despite advances in Visual Question Answering (VQA), the ability of models to assess their own correctness remains underexplored. Recent work has shown that VQA models, out-of-the-box, can have difficulties abstaining from answering when they are wrong. The option to abstain, also called Selective Prediction, is highly relevant when deploying systems to users who must trust the system's output (e.g., VQA assistants for users with visual impairments). For such scenarios, abstention can be especially important as users may provide out-of-distribution (OOD) or adversarial inputs that make incorrect answers more likely. In this work, we explore Selective VQA in both in-distribution (ID) and OOD scenarios, where models are presented with mixtures of ID and OOD data. The goal is to maximize the number of questions answered while minimizing the risk of error on those questions. We propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsSoftmax
