FunQA: Towards Surprising Video Comprehension
Binzhu Xie, Sicheng Zhang, Zitang Zhou, Bo Li, Yuanhan Zhang, Jack, Hessel, Jingkang Yang, Ziwei Liu

TL;DR
FunQA introduces a comprehensive video question-answering dataset focused on surprising, humorous, creative, and magic videos, aiming to evaluate and improve models' understanding of counter-intuitive content through diverse reasoning tasks.
Contribution
The paper presents FunQA, a new large-scale benchmark for challenging video reasoning tasks involving surprising videos, and proposes FunMentor, a dialogue-based agent to enhance vision-language model understanding.
Findings
Existing models show significant performance gaps on FunQA tasks.
FunMentor improves model understanding of counter-intuitive video content.
FunQA covers diverse reasoning tasks across 312K QA pairs from 4.3K videos.
Abstract
Surprising videos, such as funny clips, creative performances, or visual illusions, attract significant attention. Enjoyment of these videos is not simply a response to visual stimuli; rather, it hinges on the human capacity to understand (and appreciate) commonsense violations depicted in these videos. We introduce FunQA, a challenging video question-answering (QA) dataset specifically designed to evaluate and enhance the depth of video reasoning based on counter-intuitive and fun videos. Unlike most video QA benchmarks which focus on less surprising contexts, e.g., cooking or instructional videos, FunQA covers three previously unexplored types of surprising videos: 1) HumorQA, 2) CreativeQA, and 3) MagicQA. For each subset, we establish rigorous QA tasks designed to assess the model's capability in counter-intuitive timestamp localization, detailed video description, and reasoning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Video Analysis and Summarization
MethodsFocus
