A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights
Wentao Lei, Jinting Wang, Fengji Ma, Guanjie Huang, Li Liu

TL;DR
This survey comprehensively reviews the advancements, methods, datasets, and challenges in human video generation, emphasizing the importance of realistic synthesis guided by various control conditions.
Contribution
It is the first extensive literature review covering the entire domain of human video generation, detailing sub-tasks, datasets, evaluation metrics, and future research directions.
Findings
Summarizes key datasets and evaluation metrics used in the field.
Identifies main challenges such as character consistency and motion complexity.
Highlights recent progress in generative models for human videos.
Abstract
Human video generation is a dynamic and rapidly evolving task that aims to synthesize 2D human body video sequences with generative models given control conditions such as text, audio, and pose. With the potential for wide-ranging applications in film, gaming, and virtual communication, the ability to generate natural and realistic human video is critical. Recent advancements in generative models have laid a solid foundation for the growing interest in this area. Despite the significant progress, the task of human video generation remains challenging due to the consistency of characters, the complexity of human motion, and difficulties in their relationship with the environment. This survey provides a comprehensive review of the current state of human video generation, marking, to the best of our knowledge, the first extensive literature review in this domain. We start with an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition
