Leveraging Video Descriptions to Learn Video Question Answering

Kuo-Hao Zeng; Tseng-Hung Chen; Ching-Yao Chuang; Yuan-Hong Liao; Juan; Carlos Niebles; Min Sun

arXiv:1611.04021·cs.CV·December 20, 2016

Leveraging Video Descriptions to Learn Video Question Answering

Kuo-Hao Zeng, Tseng-Hung Chen, Ching-Yao Chuang, Yuan-Hong Liao, Juan, Carlos Niebles, Min Sun

PDF

TL;DR

This paper introduces a scalable method for training video question answering models using automatically generated QA pairs from online videos and descriptions, enhanced by a self-paced learning approach to handle noisy data.

Contribution

It presents a novel scalable framework for video QA training leveraging web data and a self-paced learning strategy to improve robustness against imperfect annotations.

Findings

01

Self-paced learning effectively filters noisy QA pairs.

02

Extended SS model outperforms baseline methods.

03

Approach achieves promising results on manually annotated QA pairs.

Abstract

We propose a scalable approach to learn video-based question answering (QA): answer a "free-form natural language question" about a video content. Our approach automatically harvests a large number of videos and descriptions freely available online. Then, a large number of candidate QA pairs are automatically generated from descriptions rather than manually annotated. Next, we use these candidate QA pairs to train a number of video-based QA methods extended fromMN (Sukhbaatar et al. 2015), VQA (Antol et al. 2015), SA (Yao et al. 2015), SS (Venugopalan et al. 2015). In order to handle non-perfect candidate QA pairs, we propose a self-paced learning procedure to iteratively identify them and mitigate their effects in training. Finally, we evaluate performance on manually generated video-based QA pairs. The results show that our self-paced learning procedure is effective, and the extended…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.