HunyuanVideo: A Systematic Framework For Large Video Generative Models
Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou,, Jiangfeng Xiong, Xin Li, Bo Wu, Jianwei Zhang, Kathrina Wu, Qin Lin, Junkun, Yuan, Yanxin Long, Aladdin Wang, Andong Wang, Changlin Li, Duojun Huang, Fang, Yang, Hao Tan, Hongmei Wang, Jacob Song, Jiawang Bai

TL;DR
HunyuanVideo is an open-source large-scale video generative model that achieves state-of-the-art performance, surpassing many existing models, and aims to democratize access to advanced video generation technology.
Contribution
The paper introduces HunyuanVideo, the largest open-source video foundation model with over 13 billion parameters, and presents a comprehensive framework for high-quality, large-scale video generation.
Findings
HunyuanVideo outperforms previous state-of-the-art models.
The model demonstrates high visual quality and accurate motion dynamics.
Open-source release fosters community experimentation and innovation.
Abstract
Recent advancements in video generation have significantly impacted daily life for both individuals and industries. However, the leading video generation models remain closed-source, resulting in a notable performance gap between industry capabilities and those available to the public. In this report, we introduce HunyuanVideo, an innovative open-source video foundation model that demonstrates performance in video generation comparable to, or even surpassing, that of leading closed-source models. HunyuanVideo encompasses a comprehensive framework that integrates several key elements, including data curation, advanced architectural design, progressive model scaling and training, and an efficient infrastructure tailored for large-scale model training and inference. As a result, we successfully trained a video generative model with over 13 billion parameters, making it the largest among…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗tencent/HunyuanVideomodel· 1.0k dl· ♡ 21421.0k dl♡ 2142
- 🤗benjamin-paine/taproot-commonmodel· 433 dl· ♡ 5433 dl♡ 5
- 🤗tencent/HunyuanVideo-PromptRewritemodel· 33 dl· ♡ 5333 dl♡ 53
- 🤗5UK5Qt/HunyuanVideo-PromptRewrite-HFmodel· 1 dl1 dl
- 🤗jbilcke-hf/HunyuanVideoGP-HFIEmodel· 7 dl· ♡ 47 dl♡ 4
- 🤗Cseti/HunyuanVideo-LoRA-Arcane_Jinx-v1model· ♡ 18♡ 18
- 🤗Cseti/HunyuanVideo-LoRA-Arcane_Stylemodel· ♡ 5♡ 5
- 🤗FastVideo/Hunyuan-Black-Myth-Wukong-lora-weightmodel· ♡ 3♡ 3
- 🤗jobs-git/HunyuanVideomodel· 1 dl1 dl
- 🤗tencent/HunyuanVideo-I2Vmodel· 145 dl· ♡ 350145 dl♡ 350
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Human Pose and Action Recognition
