MUTANT: A Training Paradigm for Out-of-Distribution Generalization in   Visual Question Answering

Tejas Gokhale; Pratyay Banerjee; Chitta Baral; Yezhou Yang

arXiv:2009.08566·cs.CV·October 19, 2020·6 cites

MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering

Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang

PDF

Open Access 2 Repos

TL;DR

MUTANT introduces a training paradigm that uses semantic input mutations to improve out-of-distribution generalization in visual question answering, achieving state-of-the-art results without relying on prior distribution knowledge.

Contribution

The paper proposes MUTANT, a novel training approach that employs semantic input mutations and consistency constraints to enhance OOD generalization in VQA.

Findings

01

Achieves 10.57% improvement on VQA-CP benchmark.

02

Establishes new state-of-the-art accuracy on VQA-CP.

03

Does not rely on prior knowledge of answer distributions.

Abstract

While progress has been made on the visual question answering leaderboards, models often utilize spurious correlations and priors in datasets under the i.i.d. setting. As such, evaluation on out-of-distribution (OOD) test samples has emerged as a proxy for generalization. In this paper, we present MUTANT, a training paradigm that exposes the model to perceptually similar, yet semantically distinct mutations of the input, to improve OOD generalization, such as the VQA-CP challenge. Under this paradigm, models utilize a consistency-constrained training objective to understand the effect of semantic changes in input (question-image pair) on the output (answer). Unlike existing methods on VQA-CP, MUTANT does not rely on the knowledge about the nature of train and test answer distributions. MUTANT establishes a new state-of-the-art accuracy on VQA-CP with a $10.57%$ improvement. Our work…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques