End-to-End Video Question-Answer Generation with Generator-Pretester   Network

Hung-Ting Su; Chen-Hsi Chang; Po-Wei Shen; Yu-Siang Wang; Ya-Liang; Chang; Yu-Cheng Chang; Pu-Jen Cheng; Winston H. Hsu

arXiv:2101.01447·cs.MM·January 6, 2021

End-to-End Video Question-Answer Generation with Generator-Pretester Network

Hung-Ting Su, Chen-Hsi Chang, Po-Wei Shen, Yu-Siang Wang, Ya-Liang, Chang, Yu-Cheng Chang, Pu-Jen Cheng, Winston H. Hsu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Generator-Pretester Network for automatic video question-answer pair generation, improving Video QA training and surpassing baselines with semi-supervised and supervised learning.

Contribution

The paper presents a novel Generator-Pretester Network that jointly generates and verifies question-answer pairs from videos, enhancing Video QA training with state-of-the-art performance.

Findings

01

Achieves state-of-the-art question generation performance.

02

Surpasses supervised baselines using generated QA pairs.

03

Outperforms CapQG and transfer learning in semi-supervised settings.

Abstract

We study a novel task, Video Question-Answer Generation (VQAG), for challenging Video Question Answering (Video QA) task in multimedia. Due to expensive data annotation costs, many widely used, large-scale Video QA datasets such as Video-QA, MSVD-QA and MSRVTT-QA are automatically annotated using Caption Question Generation (CapQG) which inputs captions instead of the video itself. As captions neither fully represent a video, nor are they always practically available, it is crucial to generate question-answer pairs based on a video via Video Question-Answer Generation (VQAG). Existing video-to-text (V2T) approaches, despite taking a video as the input, only generate a question alone. In this work, we propose a novel model Generator-Pretester Network that focuses on two components: (1) The Joint Question-Answer Generator (JQAG) which generates a question with its corresponding answer to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

htsucml/VQAG
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Human Pose and Action Recognition