MultiQG-TI: Towards Question Generation from Multi-modal Sources
Zichao Wang, Richard Baraniuk

TL;DR
This paper introduces MultiQG-TI, a method for automatic question generation from multi-modal sources combining images and text, improving over existing text-only approaches by integrating visual information.
Contribution
The paper presents a simple yet effective approach to generate questions from multi-modal data by combining image-to-text and OCR models with a question generator, requiring only fine-tuning.
Findings
MultiQG-TI outperforms ChatGPT with few-shot prompting on ScienceQA.
Both visual and textual signals are essential for effective question generation.
Modeling choices significantly impact the quality of generated questions.
Abstract
We study the new problem of automatic question generation (QG) from multi-modal sources containing images and texts, significantly expanding the scope of most of the existing work that focuses exclusively on QG from only textual sources. We propose a simple solution for our new problem, called MultiQG-TI, which enables a text-only question generator to process visual input in addition to textual input. Specifically, we leverage an image-to-text model and an optical character recognition model to obtain the textual description of the image and extract any texts in the image, respectively, and then feed them together with the input texts to the question generator. We only fine-tune the question generator while keeping the other components fixed. On the challenging ScienceQA dataset, we demonstrate that MultiQG-TI significantly outperforms ChatGPT with few-shot prompting, despite having…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
