End-to-End Video Question-Answer Generation with Generator-Pretester Network
Hung-Ting Su, Chen-Hsi Chang, Po-Wei Shen, Yu-Siang Wang, Ya-Liang, Chang, Yu-Cheng Chang, Pu-Jen Cheng, Winston H. Hsu

TL;DR
This paper introduces a Generator-Pretester Network for automatic video question-answer pair generation, improving Video QA training and surpassing baselines with semi-supervised and supervised learning.
Contribution
The paper presents a novel Generator-Pretester Network that jointly generates and verifies question-answer pairs from videos, enhancing Video QA training with state-of-the-art performance.
Findings
Achieves state-of-the-art question generation performance.
Surpasses supervised baselines using generated QA pairs.
Outperforms CapQG and transfer learning in semi-supervised settings.
Abstract
We study a novel task, Video Question-Answer Generation (VQAG), for challenging Video Question Answering (Video QA) task in multimedia. Due to expensive data annotation costs, many widely used, large-scale Video QA datasets such as Video-QA, MSVD-QA and MSRVTT-QA are automatically annotated using Caption Question Generation (CapQG) which inputs captions instead of the video itself. As captions neither fully represent a video, nor are they always practically available, it is crucial to generate question-answer pairs based on a video via Video Question-Answer Generation (VQAG). Existing video-to-text (V2T) approaches, despite taking a video as the input, only generate a question alone. In this work, we propose a novel model Generator-Pretester Network that focuses on two components: (1) The Joint Question-Answer Generator (JQAG) which generates a question with its corresponding answer to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Human Pose and Action Recognition
