Knowledge-Based Video Question Answering with Unsupervised Scene   Descriptions

Noa Garcia; Yuta Nakashima

arXiv:2007.08751·cs.CV·July 20, 2020

Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions

Noa Garcia, Yuta Nakashima

PDF

1 Repo

TL;DR

This paper introduces ROLL, a model for knowledge-based video question answering that combines dialogue understanding, unsupervised scene descriptions, and external knowledge, achieving state-of-the-art results on two datasets.

Contribution

The paper presents ROLL, a novel multi-task framework that integrates dialogue, scene descriptions, and external knowledge for improved video QA performance.

Findings

01

Achieves state-of-the-art results on KnowIT VQA and TVQA+ datasets.

02

Effectively combines multiple sources of information through a transformer-based fusion.

03

Demonstrates the importance of unsupervised scene descriptions and external knowledge in video understanding.

Abstract

To understand movies, humans constantly reason over the dialogues and actions shown in specific scenes and relate them to the overall storyline already seen. Inspired by this behaviour, we design ROLL, a model for knowledge-based video story question answering that leverages three crucial aspects of movie understanding: dialog comprehension, scene reasoning, and storyline recalling. In ROLL, each of these tasks is in charge of extracting rich and diverse information by 1) processing scene dialogues, 2) generating unsupervised video scene descriptions, and 3) obtaining external knowledge in a weakly supervised fashion. To answer a given question correctly, the information generated by each inspired-cognitive task is encoded via Transformers and fused through a modality weighting mechanism, which balances the information from the different sources. Exhaustive evaluation demonstrates the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

noagarcia/ROLL-VideoQA
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.