TL;DR
The paper introduces the MovieQA dataset for evaluating story comprehension in movies using question-answering across multiple data sources, highlighting its diversity and complexity.
Contribution
It presents a new large-scale dataset with diverse questions and multiple information sources, along with an evaluation benchmark for movie story understanding.
Findings
Question-answering with open-ended semantics is challenging.
The dataset covers various question types and sources.
Baseline methods show limited performance on this task.
Abstract
We introduce the MovieQA dataset which aims to evaluate automatic story comprehension from both video and text. The dataset consists of 14,944 questions about 408 movies with high semantic diversity. The questions range from simpler "Who" did "What" to "Whom", to "Why" and "How" certain events occurred. Each question comes with a set of five possible answers; a correct one and four deceiving answers provided by human annotators. Our dataset is unique in that it contains multiple sources of information -- video clips, plots, subtitles, scripts, and DVS. We analyze our data through various statistics and methods. We further extend existing QA techniques to show that question-answering with such open-ended semantics is hard. We make this data set public along with an evaluation benchmark to encourage inspiring work in this challenging domain.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
MovieQA: Understanding Stories in Movies Through Question-Answering· youtube
