Rethinking Data Augmentation for Robust Visual Question Answering

Long Chen; Yuhang Zheng; Jun Xiao

arXiv:2207.08739·cs.CV·September 16, 2022·1 cites

Rethinking Data Augmentation for Robust Visual Question Answering

Long Chen, Yuhang Zheng, Jun Xiao

PDF

Open Access 1 Repo

TL;DR

This paper introduces KDDAug, a knowledge distillation-based data augmentation method for visual question answering that generates pseudo answers for diverse image-question pairs, improving robustness and generalization across models and datasets.

Contribution

It proposes a novel, model-agnostic data augmentation strategy using knowledge distillation to generate pseudo answers, relaxing pairing constraints and enhancing VQA model robustness.

Findings

01

KDDAug improves VQA performance across multiple benchmarks.

02

It enhances robustness to out-of-distribution data.

03

The method is effective with various backbone architectures.

Abstract

Data Augmentation (DA) -- generating extra training samples beyond original training set -- has been widely-used in today's unbiased VQA models to mitigate the language biases. Current mainstream DA strategies are synthetic-based methods, which synthesize new samples by either editing some visual regions/words, or re-generating them from scratch. However, these synthetic samples are always unnatural and error-prone. To avoid this issue, a recent DA work composes new augmented samples by randomly pairing pristine images and other human-written questions. Unfortunately, to guarantee augmented samples have reasonable ground-truth answers, they manually design a set of heuristic rules for several question types, which extremely limits its generalization abilities. To this end, we propose a new Knowledge Distillation based Data Augmentation for VQA, dubbed KDDAug. Specifically, we first…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

itemzheng/kddaug
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling

MethodsKnowledge Distillation