Good, Better, Best: Textual Distractors Generation for Multiple-Choice Visual Question Answering via Reinforcement Learning
Jiaying Lu, Xin Ye, Yi Ren, Yezhou Yang

TL;DR
This paper introduces DG-VQA, a novel unsupervised task for generating challenging distractors in multiple-choice visual question answering, using reinforcement learning with pre-trained VQA models as feedback to improve robustness and data augmentation.
Contribution
The paper proposes Gobbet, a reinforcement learning framework that generates meaningful distractors without ground-truth data, leveraging pre-trained VQA models as an environment for feedback.
Findings
Gobbet effectively generates challenging distractors that fool existing VQA models.
Generated distractors improve model robustness through data augmentation.
Manual analysis reveals factors influencing distractor effectiveness.
Abstract
Multiple-choice VQA has drawn increasing attention from researchers and end-users recently. As the demand for automatically constructing large-scale multiple-choice VQA data grows, we introduce a novel task called textual Distractors Generation for VQA (DG-VQA) focusing on generating challenging yet meaningful distractors given the context image, question, and correct answer. The DG-VQA task aims at generating distractors without ground-truth training samples since such resources are rarely available. To tackle the DG-VQA unsupervisedly, we propose Gobbet, a reinforcement learning(RL) based framework that utilizes pre-trained VQA models as an alternative knowledge base to guide the distractor generation process. In Gobbet, a pre-trained VQA model serves as the environment in RL setting to provide feedback for the input multi-modal query, while a neural distractor generator serves as the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Speech and dialogue systems
