Few-Shot Visual Question Generation: A Novel Task and Benchmark Datasets
Anurag Roy, David Johnson Ekka, Saptarshi Ghosh, Abir Das

TL;DR
This paper introduces the new Few-Shot Visual Question Generation (FS-VQG) task, benchmarks existing methods, and provides a new dataset, highlighting the challenges and limitations of current models in few-shot vision and language generation.
Contribution
It defines the FS-VQG task, evaluates existing approaches, and creates the VQG-23 dataset for few-shot scenarios, advancing research in low-data visual question generation.
Findings
Existing VQG models struggle with few-shot learning.
Transfer learning and meta-learning are insufficient alone for FS-VQG.
The new dataset VQG-23 enables better evaluation of few-shot methods.
Abstract
Generating natural language questions from visual scenes, known as Visual Question Generation (VQG), has been explored in the recent past where large amounts of meticulously labeled data provide the training corpus. However, in practice, it is not uncommon to have only a few images with question annotations corresponding to a few types of answers. In this paper, we propose a new and challenging Few-Shot Visual Question Generation (FS-VQG) task and provide a comprehensive benchmark to it. Specifically, we evaluate various existing VQG approaches as well as popular few-shot solutions based on meta-learning and self-supervised strategies for the FS-VQG task. We conduct experiments on two popular existing datasets VQG and Visual7w. In addition, we have also cleaned and extended the VQG dataset for use in a few-shot scenario, with additional image-question pairs as well as additional answer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
