Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating   Video-based Large Language Models

Munan Ning; Bin Zhu; Yujia Xie; Bin Lin; Jiaxi Cui; Lu; Yuan; Dongdong Chen; Li Yuan

arXiv:2311.16103·cs.CV·November 29, 2023·5 cites

Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models

Munan Ning, Bin Zhu, Yujia Xie, Bin Lin, Jiaxi Cui, Lu, Yuan, Dongdong Chen, Li Yuan

PDF

Open Access 1 Repo

TL;DR

Video-Bench provides a comprehensive evaluation framework and toolkit for assessing Video-LLMs across understanding, reasoning, and decision-making tasks, revealing current models' limitations in human-like video comprehension.

Contribution

The paper introduces Video-Bench, a new benchmark and toolkit for systematically evaluating Video-LLMs' capabilities across multiple levels and tasks.

Findings

01

Current Video-LLMs underperform in human-like comprehension.

02

Video-Bench covers diverse tasks for comprehensive evaluation.

03

Toolkit automates metric calculation and scoring.

Abstract

Video-based large language models (Video-LLMs) have been recently introduced, targeting both fundamental improvements in perception and comprehension, and a diverse range of user inquiries. In pursuit of the ultimate goal of achieving artificial general intelligence, a truly intelligent Video-LLM model should not only see and understand the surroundings, but also possess human-level commonsense, and make well-informed decisions for the users. To guide the development of such a model, the establishment of a robust and comprehensive evaluation system becomes crucial. To this end, this paper proposes \textit{Video-Bench}, a new comprehensive benchmark along with a toolkit specifically designed for evaluating Video-LLMs. The benchmark comprises 10 meticulously crafted tasks, evaluating the capabilities of Video-LLMs across three distinct levels: Video-exclusive Understanding, Prior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pku-yuangroup/video-bench
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning