EasyAnimate: High-Performance Video Generation Framework with Hybrid Windows Attention and Reward Backpropagation
Jiaqi Xu, Kunzhe Huang, Xinyi Zou, Yunkuo Chen, Bo Liu, MengLi Cheng, Jun Huang, Xing Shi

TL;DR
EasyAnimate is a high-performance video generation framework that combines hybrid window attention and reward backpropagation to improve quality and efficiency, achieving state-of-the-art results.
Contribution
The paper introduces Hybrid Window Attention and reward backpropagation, along with training strategies and multimodal text encoding, to enhance video generation speed and quality.
Findings
Achieves state-of-the-art performance on VBench and human evaluations.
Significantly improves computational efficiency and video quality.
Introduces novel attention and training methods for video diffusion models.
Abstract
This paper introduces EasyAnimate, an efficient and high quality video generation framework that leverages diffusion transformers to achieve high-quality video production, encompassing data processing, model training, and end-to-end inference. Despite substantial advancements achieved by video diffusion models, existing video generation models still struggles with slow generation speeds and less-than-ideal video quality. To improve training and inference efficiency without compromising performance, we propose Hybrid Window Attention. We design the multidirectional sliding window attention in Hybrid Window Attention, which provides stronger receptive capabilities in 3D dimensions compared to naive one, while reducing the model's computational complexity as the video sequence length increases. To enhance video generation quality, we optimize EasyAnimate using reward backpropagation to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗alibaba-pai/EasyAnimateV2-XL-2-512x512model· 5 dl· ♡ 45 dl♡ 4
- 🤗alibaba-pai/EasyAnimateV2-XL-2-768x768model· 8 dl· ♡ 178 dl♡ 17
- 🤗alibaba-pai/EasyAnimateV3-XL-2-InP-768x768model· 8 dl· ♡ 78 dl♡ 7
- 🤗alibaba-pai/EasyAnimateV3-XL-2-InP-512x512model· 12 dl· ♡ 312 dl♡ 3
- 🤗alibaba-pai/EasyAnimateV3-XL-2-InP-960x960model· 6 dl· ♡ 46 dl♡ 4
- 🤗alibaba-pai/EasyAnimateV4-XL-2-InPmodel· 44 dl· ♡ 1144 dl♡ 11
- 🤗alibaba-pai/EasyAnimateV5-12b-zh-Controlmodel· 8 dl· ♡ 118 dl♡ 11
- 🤗alibaba-pai/EasyAnimateV5-12b-zhmodel· 13 dl· ♡ 1413 dl♡ 14
- 🤗alibaba-pai/EasyAnimateV5-7b-zh-InPmodel· 5 dl· ♡ 35 dl♡ 3
- 🤗alibaba-pai/EasyAnimateV5-7b-zhmodel· 7 dl· ♡ 27 dl♡ 2
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Video Coding and Compression Technologies
