DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation
Zhihang Yuan, Siyuan Wang, Rui Xie, Hanling Zhang, Tongcheng Fang,, Yuzhang Shang, Shengen Yan, Guohao Dai, Yu Wang

TL;DR
DLFR-VAE introduces a training-free, adaptive approach to video generation that dynamically adjusts latent frame rates based on content complexity, improving efficiency and integration with existing models.
Contribution
It presents a novel, training-free method that transforms pretrained VAE models into dynamic, content-aware video generators with adaptive temporal compression.
Findings
Enables content-dependent variable frame rates in video generation.
Accelerates video processing by adapting to scene complexity.
Seamlessly integrates with existing models as a plug-and-play module.
Abstract
In this paper, we propose the Dynamic Latent Frame Rate VAE (DLFR-VAE), a training-free paradigm that can make use of adaptive temporal compression in latent space. While existing video generative models apply fixed compression rates via pretrained VAE, we observe that real-world video content exhibits substantial temporal non-uniformity, with high-motion segments containing more information than static scenes. Based on this insight, DLFR-VAE dynamically adjusts the latent frame rate according to the content complexity. Specifically, DLFR-VAE comprises two core innovations: (1) A Dynamic Latent Frame Rate Scheduler that partitions videos into temporal chunks and adaptively determines optimal frame rates based on information-theoretic content complexity, and (2) A training-free adaptation mechanism that transforms pretrained VAE architectures into a dynamic VAE that can process features…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Advanced Vision and Imaging
