FunQA: Towards Surprising Video Comprehension

Binzhu Xie; Sicheng Zhang; Zitang Zhou; Bo Li; Yuanhan Zhang; Jack; Hessel; Jingkang Yang; Ziwei Liu

arXiv:2306.14899·cs.CV·March 25, 2024·1 cites

FunQA: Towards Surprising Video Comprehension

Binzhu Xie, Sicheng Zhang, Zitang Zhou, Bo Li, Yuanhan Zhang, Jack, Hessel, Jingkang Yang, Ziwei Liu

PDF

Open Access 1 Repo 1 Datasets

TL;DR

FunQA introduces a comprehensive video question-answering dataset focused on surprising, humorous, creative, and magic videos, aiming to evaluate and improve models' understanding of counter-intuitive content through diverse reasoning tasks.

Contribution

The paper presents FunQA, a new large-scale benchmark for challenging video reasoning tasks involving surprising videos, and proposes FunMentor, a dialogue-based agent to enhance vision-language model understanding.

Findings

01

Existing models show significant performance gaps on FunQA tasks.

02

FunMentor improves model understanding of counter-intuitive video content.

03

FunQA covers diverse reasoning tasks across 312K QA pairs from 4.3K videos.

Abstract

Surprising videos, such as funny clips, creative performances, or visual illusions, attract significant attention. Enjoyment of these videos is not simply a response to visual stimuli; rather, it hinges on the human capacity to understand (and appreciate) commonsense violations depicted in these videos. We introduce FunQA, a challenging video question-answering (QA) dataset specifically designed to evaluate and enhance the depth of video reasoning based on counter-intuitive and fun videos. Unlike most video QA benchmarks which focus on less surprising contexts, e.g., cooking or instructional videos, FunQA covers three previously unexplored types of surprising videos: 1) HumorQA, 2) CreativeQA, and 3) MagicQA. For each subset, we establish rigorous QA tasks designed to assess the model's capability in counter-intuitive timestamp localization, detailed video description, and reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jingkang50/funqa
noneOfficial

Datasets

fesvhtr/FunQA
dataset· 252 dl
252 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Video Analysis and Summarization

MethodsFocus