CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding
Guo Chen, Yicheng Liu, Yifei Huang, Yuping He, Baoqi Pei, Jilan Xu,, Yali Wang, Tong Lu, Limin Wang

TL;DR
CG-Bench is a comprehensive long video understanding benchmark that emphasizes clue-grounded question answering, addressing limitations of existing short-video focused benchmarks and multiple-choice evaluations.
Contribution
It introduces the largest long video analysis benchmark with novel clue-grounded evaluation methods to better assess true understanding of videos by multimodal large language models.
Findings
Current models underperform on long videos compared to short videos.
Significant performance gap exists between open-source and commercial models.
CG-Bench provides a new standard for trustworthy long video understanding evaluation.
Abstract
Most existing video understanding benchmarks for multimodal large language models (MLLMs) focus only on short videos. The limited number of benchmarks for long video understanding often rely solely on multiple-choice questions (MCQs). However, because of the inherent limitation of MCQ-based evaluation and the increasing reasoning ability of MLLMs, models can give the current answer purely by combining short video understanding with elimination, without genuinely understanding the video content. To address this gap, we introduce CG-Bench, a novel benchmark designed for clue-grounded question answering in long videos. CG-Bench emphasizes the model's ability to retrieve relevant clues for questions, enhancing evaluation credibility. It features 1,219 manually curated videos categorized by a granular system with 14 primary categories, 171 secondary categories, and 638 tertiary categories,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · COVID-19 diagnosis using AI
MethodsFocus
