OmniHuman: A Large-scale Dataset and Benchmark for Human-Centric Video Generation

Lei Zhu; Xing Cai; Yingjie Chen; Yiheng Li; Binxin Yang; Hao Liu; Jie Chen; Chen Li; Jing LYu

arXiv:2604.18326·cs.CV·April 21, 2026

OmniHuman: A Large-scale Dataset and Benchmark for Human-Centric Video Generation

Lei Zhu, Xing Cai, Yingjie Chen, Yiheng Li, Binxin Yang, Hao Liu, Jie Chen, Chen Li, Jing LYu

PDF

TL;DR

OmniHuman introduces a large-scale, multi-scene dataset and benchmark for human-centric video generation, addressing current data limitations and enabling more realistic, detailed synthesis.

Contribution

The paper presents OmniHuman, a comprehensive dataset with hierarchical annotations and an evaluation benchmark for improved human-centric video synthesis.

Findings

01

OmniHuman dataset covers diverse scenes, interactions, and attributes.

02

The benchmark includes metrics aligned with human perception.

03

Results demonstrate improved evaluation of human-centric video quality.

Abstract

Recent advancements in audio-video joint generation models have demonstrated impressive capabilities in content creation. However, generating high-fidelity human-centric videos in complex, real-world physical scenes remains a significant challenge. We identify that the root cause lies in the structural deficiencies of existing datasets across three dimensions: limited global scene and camera diversity, sparse interaction modeling (both person-person and person-object), and insufficient individual attribute alignment. To bridge these gaps, we present OmniHuman, a large-scale, multi-scene dataset designed for fine-grained human modeling. OmniHuman provides a hierarchical annotation covering video-level scenes, frame-level interactions, and individual-level attributes. To facilitate this, we develop a fully automated pipeline for high-quality data collection and multi-modal annotation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.