EasyAnimate: High-Performance Video Generation Framework with Hybrid Windows Attention and Reward Backpropagation

Jiaqi Xu; Kunzhe Huang; Xinyi Zou; Yunkuo Chen; Bo Liu; MengLi Cheng; Jun Huang; Xing Shi

arXiv:2405.18991·cs.CV·March 6, 2026·2 cites

EasyAnimate: High-Performance Video Generation Framework with Hybrid Windows Attention and Reward Backpropagation

Jiaqi Xu, Kunzhe Huang, Xinyi Zou, Yunkuo Chen, Bo Liu, MengLi Cheng, Jun Huang, Xing Shi

PDF

Open Access 1 Repo 10 Models

TL;DR

EasyAnimate is a high-performance video generation framework that combines hybrid window attention and reward backpropagation to improve quality and efficiency, achieving state-of-the-art results.

Contribution

The paper introduces Hybrid Window Attention and reward backpropagation, along with training strategies and multimodal text encoding, to enhance video generation speed and quality.

Findings

01

Achieves state-of-the-art performance on VBench and human evaluations.

02

Significantly improves computational efficiency and video quality.

03

Introduces novel attention and training methods for video diffusion models.

Abstract

This paper introduces EasyAnimate, an efficient and high quality video generation framework that leverages diffusion transformers to achieve high-quality video production, encompassing data processing, model training, and end-to-end inference. Despite substantial advancements achieved by video diffusion models, existing video generation models still struggles with slow generation speeds and less-than-ideal video quality. To improve training and inference efficiency without compromising performance, we propose Hybrid Window Attention. We design the multidirectional sliding window attention in Hybrid Window Attention, which provides stronger receptive capabilities in 3D dimensions compared to naive one, while reducing the model's computational complexity as the video sequence length increases. To enhance video generation quality, we optimize EasyAnimate using reward backpropagation to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aigc-apps/easyanimate
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Video Coding and Compression Technologies