HumanVid: Demystifying Training Data for Camera-controllable Human Image   Animation

Zhenzhi Wang; Yixuan Li; Yanhong Zeng; Youqing Fang; Yuwei Guo; Wenran; Liu; Jing Tan; Kai Chen; Tianfan Xue; Bo Dai; Dahua Lin

arXiv:2407.17438·cs.CV·November 22, 2024

HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation

Zhenzhi Wang, Yixuan Li, Yanhong Zeng, Youqing Fang, Yuwei Guo, Wenran, Liu, Jing Tan, Kai Chen, Tianfan Xue, Bo Dai, Dahua Lin

PDF

1 Repo 1 Video

TL;DR

HumanVid is a large-scale, high-quality dataset combining real-world and synthetic videos with annotated human and camera motions, enabling improved training and benchmarking of camera-controllable human image animation models.

Contribution

The paper introduces HumanVid, the first comprehensive dataset with diverse real and synthetic human videos and camera motion annotations, facilitating fair benchmarking and advancing controllable human image animation.

Findings

01

Baseline model trained on HumanVid achieves state-of-the-art control over human and camera motions.

02

Synthetic data generation with rule-based camera trajectories enhances diversity and annotation accuracy.

03

HumanVid enables more transparent and fair benchmarking of human image animation methods.

Abstract

Human image animation involves generating videos from a character photo, allowing user control and unlocking the potential for video and movie production. While recent approaches yield impressive results using high-quality training data, the inaccessibility of these datasets hampers fair and transparent benchmarking. Moreover, these approaches prioritize 2D human motion and overlook the significance of camera motions in videos, leading to limited control and unstable video generation. To demystify the training data, we present HumanVid, the first large-scale high-quality dataset tailored for human image animation, which combines crafted real-world and synthetic data. For the real-world data, we compile a vast collection of real-world videos from the internet. We developed and applied careful filtering rules to ensure video quality, resulting in a curated collection of 20K…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhenzhiwang/humanvid
pytorchOfficial

Videos

HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation· slideslive