CG-Bench: Clue-grounded Question Answering Benchmark for Long Video   Understanding

Guo Chen; Yicheng Liu; Yifei Huang; Yuping He; Baoqi Pei; Jilan Xu,; Yali Wang; Tong Lu; Limin Wang

arXiv:2412.12075·cs.CV·December 17, 2024

CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding

Guo Chen, Yicheng Liu, Yifei Huang, Yuping He, Baoqi Pei, Jilan Xu,, Yali Wang, Tong Lu, Limin Wang

PDF

Open Access 1 Datasets 1 Video

TL;DR

CG-Bench is a comprehensive long video understanding benchmark that emphasizes clue-grounded question answering, addressing limitations of existing short-video focused benchmarks and multiple-choice evaluations.

Contribution

It introduces the largest long video analysis benchmark with novel clue-grounded evaluation methods to better assess true understanding of videos by multimodal large language models.

Findings

01

Current models underperform on long videos compared to short videos.

02

Significant performance gap exists between open-source and commercial models.

03

CG-Bench provides a new standard for trustworthy long video understanding evaluation.

Abstract

Most existing video understanding benchmarks for multimodal large language models (MLLMs) focus only on short videos. The limited number of benchmarks for long video understanding often rely solely on multiple-choice questions (MCQs). However, because of the inherent limitation of MCQ-based evaluation and the increasing reasoning ability of MLLMs, models can give the current answer purely by combining short video understanding with elimination, without genuinely understanding the video content. To address this gap, we introduce CG-Bench, a novel benchmark designed for clue-grounded question answering in long videos. CG-Bench emphasizes the model's ability to retrieve relevant clues for questions, enhancing evaluation credibility. It features 1,219 manually curated videos categorized by a granular system with 14 primary categories, 171 secondary categories, and 638 tertiary categories,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

CG-Bench/CG-Bench
dataset· 4.1k dl
4.1k dl

Videos

CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · COVID-19 diagnosis using AI

MethodsFocus