Multi-VQG: Generating Engaging Questions for Multiple Images
Min-Hsuan Yeh, Vicent Chen, Ting-Hao 'Kenneth' Haung, Lun-Wei Ku

TL;DR
This paper introduces MVQG, a new dataset and models for generating engaging questions from multiple images, emphasizing story construction over single images to enhance question relevance and creativity.
Contribution
The paper presents a novel dataset and baseline models for multi-image question generation, highlighting the importance of story understanding in visual question generation.
Findings
Models can generate engaging questions by understanding image sequences.
Story-based question generation outperforms single-image approaches.
The dataset enables new research directions in visual storytelling and question generation.
Abstract
Generating engaging content has drawn much recent attention in the NLP community. Asking questions is a natural way to respond to photos and promote awareness. However, most answers to questions in traditional question-answering (QA) datasets are factoids, which reduce individuals' willingness to answer. Furthermore, traditional visual question generation (VQG) confines the source data for question generation to single images, resulting in a limited ability to comprehend time-series information of the underlying event. In this paper, we propose generating engaging questions from multiple images. We present MVQG, a new dataset, and establish a series of baselines, including both end-to-end and dual-stage architectures. Results show that building stories behind the image sequence enables models to generate engaging questions, which confirms our assumption that people typically construct a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
