Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering
Jihyung Kil, Cheng Zhang, Dong Xuan, Wei-Lun Chao

TL;DR
This paper introduces SimpleAug, a data augmentation method that converts implicit dataset knowledge into explicit training examples, significantly improving VQA model performance across different datasets and reducing reliance on extensive annotations.
Contribution
The paper proposes a novel data augmentation pipeline that leverages implicit dataset knowledge to generate additional training examples for VQA, enhancing model robustness and generalization.
Findings
Improved VQA accuracy on VQA-CP and VQA v2 datasets.
Effective utilization of weakly-labeled and unlabeled images.
Demonstrated that dataset implicit knowledge can be systematically exploited.
Abstract
Visual question answering (VQA) is challenging not only because the model has to handle multi-modal information, but also because it is just so hard to collect sufficient training examples -- there are too many questions one can ask about an image. As a result, a VQA model trained solely on human-annotated examples could easily over-fit specific question styles or image contents that are being asked, leaving the model largely ignorant about the sheer diversity of questions. Existing methods address this issue primarily by introducing an auxiliary task such as visual grounding, cycle consistency, or debiasing. In this paper, we take a drastically different approach. We found that many of the "unknowns" to the learned VQA model are indeed "known" in the dataset implicitly. For instance, questions asking about the same object in different images are likely paraphrases; the number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
