FriendsQA: A New Large-Scale Deep Video Understanding Dataset with   Fine-grained Topic Categorization for Story Videos

Zhengqian Wu; Ruizhe Li; Zijun Xu; Zhongyuan Wang; Chunxia Xiao; Chao; Liang

arXiv:2412.17022·cs.CV·December 24, 2024

FriendsQA: A New Large-Scale Deep Video Understanding Dataset with Fine-grained Topic Categorization for Story Videos

Zhengqian Wu, Ruizhe Li, Zijun Xu, Zhongyuan Wang, Chunxia Xiao, Chao, Liang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces FriendsQA, a large-scale deep video understanding dataset with fine-grained topic categorization for story videos, enabling better assessment of VideoQA models' comprehension of complex storylines.

Contribution

It presents a novel dataset created using a language model-based framework, with detailed topic annotations, to evaluate deep video understanding in story videos.

Findings

01

State-of-the-art models show varied performance on FriendsQA.

02

The dataset reveals challenges in deep understanding of complex storylines.

03

FriendsQA enables more comprehensive evaluation of VideoQA models.

Abstract

Video question answering (VideoQA) aims to answer natural language questions according to the given videos. Although existing models perform well in the factoid VideoQA task, they still face challenges in deep video understanding (DVU) task, which focuses on story videos. Compared to factoid videos, the most significant feature of story videos is storylines, which are composed of complex interactions and long-range evolvement of core story topics including characters, actions and locations. Understanding these topics requires models to possess DVU capability. However, existing DVU datasets rarely organize questions according to these story topics, making them difficult to comprehensively assess VideoQA models' DVU capability of complex storylines. Additionally, the question quantity and video length of these dataset are limited by high labor costs of handcrafted dataset building method.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nercms-mmap/friendsqa
noneOfficial

Videos

FriendsQA: A New Large-Scale Deep Video Understanding Dataset with Fine-grained Topic Categorization for Story Videos· underline

Taxonomy

TopicsVideo Analysis and Summarization · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition