TL;DR
HumanVid is a large-scale, high-quality dataset combining real-world and synthetic videos with annotated human and camera motions, enabling improved training and benchmarking of camera-controllable human image animation models.
Contribution
The paper introduces HumanVid, the first comprehensive dataset with diverse real and synthetic human videos and camera motion annotations, facilitating fair benchmarking and advancing controllable human image animation.
Findings
Baseline model trained on HumanVid achieves state-of-the-art control over human and camera motions.
Synthetic data generation with rule-based camera trajectories enhances diversity and annotation accuracy.
HumanVid enables more transparent and fair benchmarking of human image animation methods.
Abstract
Human image animation involves generating videos from a character photo, allowing user control and unlocking the potential for video and movie production. While recent approaches yield impressive results using high-quality training data, the inaccessibility of these datasets hampers fair and transparent benchmarking. Moreover, these approaches prioritize 2D human motion and overlook the significance of camera motions in videos, leading to limited control and unstable video generation. To demystify the training data, we present HumanVid, the first large-scale high-quality dataset tailored for human image animation, which combines crafted real-world and synthetic data. For the real-world data, we compile a vast collection of real-world videos from the internet. We developed and applied careful filtering rules to ensure video quality, resulting in a curated collection of 20K…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
