UniVBench: Towards Unified Evaluation for Video Foundation Models

Jianhui Wei; Xiaotian Zhang; Yichen Li; Yuan Wang; Yan Zhang; Ziyi Chen; Zhihang Tang; Wei Xu; Zuozhu Liu

arXiv:2602.21835·cs.CV·March 9, 2026

UniVBench: Towards Unified Evaluation for Video Foundation Models

Jianhui Wei, Xiaotian Zhang, Yichen Li, Yuan Wang, Yan Zhang, Ziyi Chen, Zhihang Tang, Wei Xu, Zuozhu Liu

PDF

Open Access

TL;DR

UniVBench introduces a comprehensive, multi-task benchmark for evaluating video foundation models across understanding, generation, editing, and reconstruction, addressing the limitations of existing fragmented evaluation methods.

Contribution

The paper presents UniVBench, a unified evaluation framework with diverse high-quality videos and a standardized assessment system for multiple core video model capabilities.

Findings

01

Expanded evaluation complexity with 200 diverse videos

02

Unified scoring system for fair comparison

03

Alignment of evaluation with human judgment

Abstract

Video foundation models aim to integrate video understanding, generation, editing, and instruction following within a single framework, making them a central direction for next-generation multimodal systems. However, existing evaluation benchmarks remain fragmented and limited in scope, as they each target a single task, rely on task-specific metrics, and typically use short or simple video clips. As a result, they do not capture the unified capabilities that these models are designed to deliver. To address this gap, we introduce UniVBench, a benchmark purpose-built for evaluating video foundation models across four core abilities: video understanding, video generation, video editing, and a newly proposed task, video reconstruction, which assesses how faithfully a model can reproduce video content it has encountered. Our benchmark substantially expands the complexity of evaluation by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Video Analysis and Summarization