Latent Video Diffusion Models for High-Fidelity Long Video Generation
Yingqing He, Tianyu Yang, Yong Zhang, Ying Shan, Qifeng Chen

TL;DR
This paper introduces lightweight hierarchical latent diffusion models for high-fidelity, long video generation, significantly improving quality and length over previous methods while maintaining computational efficiency.
Contribution
The paper proposes a novel hierarchical latent diffusion framework with conditional latent perturbation and guidance, enabling realistic, long videos with reduced computational costs.
Findings
Outperforms previous pixel-space diffusion models in quality and length
Enables generation of videos with over a thousand frames
Demonstrates superior results in small domain and text-to-video tasks
Abstract
AI-generated content has attracted lots of attention recently, but photo-realistic video synthesis is still challenging. Although many attempts using GANs and autoregressive models have been made in this area, the visual quality and length of generated videos are far from satisfactory. Diffusion models have shown remarkable results recently but require significant computational resources. To address this, we introduce lightweight video diffusion models by leveraging a low-dimensional 3D latent space, significantly outperforming previous pixel-space video diffusion models under a limited computational budget. In addition, we propose hierarchical diffusion in the latent space such that longer videos with more than one thousand frames can be produced. To further overcome the performance degradation issue for long video generation, we propose conditional latent perturbation and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis
MethodsDiffusion
