Question Generation for Evaluating Cross-Dataset Shifts in Multi-modal Grounding
Arjun R. Akula

TL;DR
This paper introduces a question generation module that automatically creates out-of-distribution shifts to systematically evaluate how well VQA models adapt across different datasets, addressing domain mismatch issues.
Contribution
It proposes a novel VQG module for generating OOD shifts, enabling systematic evaluation of cross-dataset adaptation in VQA models.
Findings
The VQG module effectively generates diverse OOD shifts.
It helps identify specific domain mismatches affecting model performance.
The approach improves understanding of cross-dataset generalization in VQA.
Abstract
Visual question answering (VQA) is the multi-modal task of answering natural language questions about an input image. Through cross-dataset adaptation methods, it is possible to transfer knowledge from a source dataset with larger train samples to a target dataset where training set is limited. Suppose a VQA model trained on one dataset train set fails in adapting to another, it is hard to identify the underlying cause of domain mismatch as there could exists a multitude of reasons such as image distribution mismatch and question distribution mismatch. At UCLA, we are working on a VQG module that facilitate in automatically generating OOD shifts that aid in systematically evaluating cross-dataset adaptation capabilities of VQA models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Speech and dialogue systems
