MultiQG-TI: Towards Question Generation from Multi-modal Sources

Zichao Wang; Richard Baraniuk

arXiv:2307.04643·cs.CL·July 11, 2023

MultiQG-TI: Towards Question Generation from Multi-modal Sources

Zichao Wang, Richard Baraniuk

PDF

Open Access 1 Repo

TL;DR

This paper introduces MultiQG-TI, a method for automatic question generation from multi-modal sources combining images and text, improving over existing text-only approaches by integrating visual information.

Contribution

The paper presents a simple yet effective approach to generate questions from multi-modal data by combining image-to-text and OCR models with a question generator, requiring only fine-tuning.

Findings

01

MultiQG-TI outperforms ChatGPT with few-shot prompting on ScienceQA.

02

Both visual and textual signals are essential for effective question generation.

03

Modeling choices significantly impact the quality of generated questions.

Abstract

We study the new problem of automatic question generation (QG) from multi-modal sources containing images and texts, significantly expanding the scope of most of the existing work that focuses exclusively on QG from only textual sources. We propose a simple solution for our new problem, called MultiQG-TI, which enables a text-only question generator to process visual input in addition to textual input. Specifically, we leverage an image-to-text model and an optical character recognition model to obtain the textual description of the image and extract any texts in the image, respectively, and then feed them together with the input texts to the question generator. We only fine-tune the question generator while keeping the other components fixed. On the challenging ScienceQA dataset, we demonstrate that MultiQG-TI significantly outperforms ChatGPT with few-shot prompting, despite having…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

moonlightlane/multiqg-ti
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques