MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Yilun Zhao; Lujing Xie; Haowei Zhang; Guo Gan; Yitao Long; Zhiyuan Hu,; Tongyan Hu; Weiyuan Chen; Chuhan Li; Junyang Song; Zhijian Xu; Chengye Wang,; Weifeng Pan; Ziyao Shangguan; Xiangru Tang; Zhenwen Liang; Yixin Liu; Chen; Zhao; Arman Cohan

arXiv:2501.12380·cs.CV·January 22, 2025

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Yilun Zhao, Lujing Xie, Haowei Zhang, Guo Gan, Yitao Long, Zhiyuan Hu,, Tongyan Hu, Weiyuan Chen, Chuhan Li, Junyang Song, Zhijian Xu, Chengye Wang,, Weifeng Pan, Ziyao Shangguan, Xiangru Tang, Zhenwen Liang, Yixin Liu, Chen, Zhao, Arman Cohan

PDF

Open Access 1 Repo 2 Datasets

TL;DR

MMVU is a new comprehensive benchmark for evaluating foundation models in expert-level, multi-discipline video understanding, emphasizing domain-specific reasoning and high-quality annotations.

Contribution

It introduces a large, expert-annotated dataset with domain knowledge and reasoning, advancing evaluation of models in specialized video understanding tasks.

Findings

01

State-of-the-art models still lag behind human experts.

02

Models perform better with domain-specific reasoning.

03

Error analysis reveals key challenges for future research.

Abstract

We introduce MMVU, a comprehensive expert-level, multi-discipline benchmark for evaluating foundation models in video understanding. MMVU includes 3,000 expert-annotated questions spanning 27 subjects across four core disciplines: Science, Healthcare, Humanities & Social Sciences, and Engineering. Compared to prior benchmarks, MMVU features three key advancements. First, it challenges models to apply domain-specific knowledge and perform expert-level reasoning to analyze specialized-domain videos, moving beyond the basic visual perception typically assessed in current video benchmarks. Second, each example is annotated by human experts from scratch. We implement strict data quality controls to ensure the high quality of the dataset. Finally, each example is enriched with expert-annotated reasoning rationals and relevant domain knowledge, facilitating in-depth analysis. We conduct an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yale-nlp/mmvu
pytorchOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRadiology practices and education · Clinical Reasoning and Diagnostic Skills