OmniHuman: A Large-scale Dataset and Benchmark for Human-Centric Video Generation
Lei Zhu, Xing Cai, Yingjie Chen, Yiheng Li, Binxin Yang, Hao Liu, Jie Chen, Chen Li, Jing LYu

TL;DR
OmniHuman introduces a large-scale, multi-scene dataset and benchmark for human-centric video generation, addressing current data limitations and enabling more realistic, detailed synthesis.
Contribution
The paper presents OmniHuman, a comprehensive dataset with hierarchical annotations and an evaluation benchmark for improved human-centric video synthesis.
Findings
OmniHuman dataset covers diverse scenes, interactions, and attributes.
The benchmark includes metrics aligned with human perception.
Results demonstrate improved evaluation of human-centric video quality.
Abstract
Recent advancements in audio-video joint generation models have demonstrated impressive capabilities in content creation. However, generating high-fidelity human-centric videos in complex, real-world physical scenes remains a significant challenge. We identify that the root cause lies in the structural deficiencies of existing datasets across three dimensions: limited global scene and camera diversity, sparse interaction modeling (both person-person and person-object), and insufficient individual attribute alignment. To bridge these gaps, we present OmniHuman, a large-scale, multi-scene dataset designed for fine-grained human modeling. OmniHuman provides a hierarchical annotation covering video-level scenes, frame-level interactions, and individual-level attributes. To facilitate this, we develop a fully automated pipeline for high-quality data collection and multi-modal annotation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
