Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline   Professional Videos

Kairui Hu; Penghao Wu; Fanyi Pu; Wang Xiao; Yuanhan Zhang; Xiang Yue,; Bo Li; Ziwei Liu

arXiv:2501.13826·cs.CV·January 24, 2025

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Kairui Hu, Penghao Wu, Fanyi Pu, Wang Xiao, Yuanhan Zhang, Xiang Yue,, Bo Li, Ziwei Liu

PDF

Open Access 1 Repo 2 Datasets

TL;DR

Video-MMMU introduces a comprehensive benchmark to evaluate how well large multimodal models acquire and utilize knowledge from videos across multiple disciplines, highlighting current limitations and gaps in model learning capabilities.

Contribution

The paper presents Video-MMMU, a new multi-disciplinary benchmark with a knowledge gain metric to systematically assess LMMs' knowledge acquisition from videos.

Findings

01

LMMs' performance declines with increasing cognitive demands

02

Humans outperform models in knowledge acquisition tasks

03

Significant gap exists between human and model learning from videos

Abstract

Humans acquire knowledge through three cognitive stages: perceiving information, comprehending knowledge, and adapting knowledge to solve novel problems. Videos serve as an effective medium for this learning process, facilitating a progression through these cognitive stages. However, existing video benchmarks fail to systematically evaluate the knowledge acquisition capabilities in Large Multimodal Models (LMMs). To address this gap, we introduce Video-MMMU, a multi-modal, multi-disciplinary benchmark designed to assess LMMs' ability to acquire and utilize knowledge from videos. Video-MMMU features a curated collection of 300 expert-level videos and 900 human-annotated questions across six disciplines, evaluating knowledge acquisition through stage-aligned question-answer pairs: Perception, Comprehension, and Adaptation. A proposed knowledge gain metric, {\Delta}knowledge, quantifies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

evolvinglmms-lab/lmms-eval
pytorch

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOnline and Blended Learning