VideoLLM Benchmarks and Evaluation: A Survey

Yogesh Kumar

arXiv:2505.03829·cs.CV·May 8, 2025

VideoLLM Benchmarks and Evaluation: A Survey

Yogesh Kumar

PDF

Open Access

TL;DR

This survey reviews current benchmarks and evaluation methods for Video Large Language Models, analyzing their strengths, limitations, and future directions to improve video understanding assessments.

Contribution

It provides a comprehensive analysis of existing VideoLLM benchmarks, evaluation protocols, and highlights key challenges and future research directions in the field.

Findings

01

Performance trends of state-of-the-art VideoLLMs across benchmarks

02

Identification of limitations in current evaluation protocols

03

Proposals for more diverse and interpretability-focused benchmarks

Abstract

The rapid development of Large Language Models (LLMs) has catalyzed significant advancements in video understanding technologies. This survey provides a comprehensive analysis of benchmarks and evaluation methodologies specifically designed or used for Video Large Language Models (VideoLLMs). We examine the current landscape of video understanding benchmarks, discussing their characteristics, evaluation protocols, and limitations. The paper analyzes various evaluation methodologies, including closed-set, open-set, and specialized evaluations for temporal and spatiotemporal understanding tasks. We highlight the performance trends of state-of-the-art VideoLLMs across these benchmarks and identify key challenges in current evaluation frameworks. Additionally, we propose future research directions to enhance benchmark design, evaluation metrics, and protocols, including the need for more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Graph Neural Networks