HuM-Eval: A Coarse-to-Fine Framework for Human-Centric Video Evaluation

Bingzi Zhang; Kaisi Guan; Ruihua Song

arXiv:2604.25361·cs.CV·April 29, 2026

HuM-Eval: A Coarse-to-Fine Framework for Human-Centric Video Evaluation

Bingzi Zhang, Kaisi Guan, Ruihua Song

PDF

TL;DR

HuM-Eval is a new human-centric video evaluation framework that combines global assessment with detailed analysis of human motion to better match human preferences.

Contribution

It introduces a coarse-to-fine evaluation approach using vision-language models and human motion analysis, along with a new benchmark HuM-Bench for assessing human motion in videos.

Findings

01

HuM-Eval achieves 58.2% average human correlation, outperforming previous metrics.

02

The framework effectively combines global quality assessment with detailed human motion verification.

03

HuM-Bench provides a diverse dataset for evaluating human motion generation models.

Abstract

Video generation models have developed rapidly in recent years, where generating natural human motion plays a pivotal role. However, accurately evaluating the quality of generated human motion video remains a significant challenge. Existing evaluation metrics primarily focus on global scene statistics, often overlooking fine-grained human details and consequently failing to align with human subjective preference. To bridge this gap, we propose HuM-Eval, a novel human-centric evaluation framework that adopts a coarse-to-fine strategy. Specifically, our framework first utilizes a Vision Language Model to perform a coarse assessment of global video quality. It then proceeds to a fine-grained analysis, using 2D pose to verify anatomical correctness and 3D human motion to evaluate motion stability. Extensive experiments demonstrate that HuM-Eval achieves an average human correlation of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.