OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing   Human-Centric Video Generation

Hui Li; Mingwang Xu; Yun Zhan; Shan Mu; Jiaye Li; Kaihui Cheng; Yuxuan; Chen; Tan Chen; Mao Ye; Jingdong Wang; Siyu Zhu

arXiv:2412.00115·cs.CV·January 7, 2025

OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation

Hui Li, Mingwang Xu, Yun Zhan, Shan Mu, Jiaye Li, Kaihui Cheng, Yuxuan, Chen, Tan Chen, Mao Ye, Jingdong Wang, Siyu Zhu

PDF

Open Access

TL;DR

OpenHumanVid introduces a large, high-quality human-centric video dataset with detailed annotations, significantly improving the training and quality of human video generation models through enhanced data and alignment strategies.

Contribution

The paper presents a new large-scale dataset with detailed human-centric annotations and extends diffusion transformer models to improve human video generation.

Findings

01

Large-scale dataset improves evaluation metrics for human video generation.

02

Proper alignment of text with human appearance and motion is crucial for quality.

03

Extended models trained on the dataset outperform previous methods.

Abstract

Recent advancements in visual generation technologies have markedly increased the scale and availability of video datasets, which are crucial for training effective video generation models. However, a significant lack of high-quality, human-centric video datasets presents a challenge to progress in this field. To bridge this gap, we introduce OpenHumanVid, a large-scale and high-quality human-centric video dataset characterized by precise and detailed captions that encompass both human appearance and motion states, along with supplementary human motion conditions, including skeleton sequences and speech audio. To validate the efficacy of this dataset and the associated training strategies, we propose an extension of existing classical diffusion transformer architectures and conduct further pretraining of our models on the proposed dataset. Our findings yield two critical insights:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition

MethodsDiffusion