HunyuanVideo 1.5 Technical Report

Bing Wu; Chang Zou; Changlin Li; Duojun Huang; Fang Yang; Hao Tan; Jack Peng; Jianbing Wu; Jiangfeng Xiong; Jie Jiang; Linus; Patrol; Peizhen Zhang; Peng Chen; Penghao Zhao; Qi Tian; Songtao Liu; Weijie Kong; Weiyan Wang; Xiao He; Xin Li; Xinchi Deng; Xuefei Zhe; Yang Li; Yanxin Long; Yuanbo Peng; Yue Wu; Yuhong Liu; Zhenyu Wang; Zuozhuo Dai; Bo Peng; Coopers Li; Gu Gong; Guojian Xiao; Jiahe Tian; Jiaxin Lin; Jie Liu; Jihong Zhang; Jiesong Lian; Kaihang Pan; Lei Wang; Lin Niu; Mingtao Chen; Mingyang Chen; Mingzhe Zheng; Miles Yang; Qiangqiang Hu; Qi Yang; Qiuyong Xiao; Runzhou Wu; Ryan Xu; Rui Yuan; Shanshan Sang; Shisheng Huang; Siruis Gong; Shuo Huang; Weiting Guo; Xiang Yuan; Xiaojia Chen; Xiawei Hu; Wenzhi Sun; Xiele Wu; Xianshun Ren; Xiaoyan Yuan; Xiaoyue Mi; Yepeng Zhang; Yifu Sun; Yiting Lu; Yitong Li; You Huang; Yu Tang; Yixuan Li; Yuhang Deng; Yuan Zhou; Zhichao Hu; Zhiguang Liu; Zhihe Yang; Zilin Yang; Zhenzhi Lu; Zixiang Zhou; Zhao Zhong

arXiv:2511.18870·cs.CV·November 26, 2025

HunyuanVideo 1.5 Technical Report

Bing Wu, Chang Zou, Changlin Li, Duojun Huang, Fang Yang, Hao Tan, Jack Peng, Jianbing Wu, Jiangfeng Xiong, Jie Jiang, Linus, Patrol, Peizhen Zhang, Peng Chen, Penghao Zhao, Qi Tian, Songtao Liu, Weijie Kong, Weiyan Wang, Xiao He, Xin Li, Xinchi Deng, Xuefei Zhe, Yang Li

PDF

Open Access 4 Models 1 Datasets

TL;DR

HunyuanVideo 1.5 is a lightweight, open-source video generation model that achieves state-of-the-art quality and coherence with only 8.3 billion parameters, enabling efficient use on consumer GPUs.

Contribution

The paper introduces a novel, compact video generation framework with advanced architecture and training techniques, setting new open-source benchmarks.

Findings

01

Achieves state-of-the-art quality in open-source video generation.

02

Operates efficiently on consumer-grade GPUs.

03

Supports high-quality text-to-video and image-to-video tasks.

Abstract

We present HunyuanVideo 1.5, a lightweight yet powerful open-source video generation model that achieves state-of-the-art visual quality and motion coherence with only 8.3 billion parameters, enabling efficient inference on consumer-grade GPUs. This achievement is built upon several key components, including meticulous data curation, an advanced DiT architecture featuring selective and sliding tile attention (SSTA), enhanced bilingual understanding through glyph-aware text encoding, progressive pre-training and post-training, and an efficient video super-resolution network. Leveraging these designs, we developed a unified framework capable of high-quality text-to-video and image-to-video generation across multiple durations and resolutions. Extensive experiments demonstrate that this compact and proficient model establishes a new state-of-the-art among open-source video generation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

xingzhaohu/hunyuan1.5_training
dataset· 1.3k dl
1.3k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Face recognition and analysis