DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation

Zhihang Yuan; Siyuan Wang; Rui Xie; Hanling Zhang; Tongcheng Fang,; Yuzhang Shang; Shengen Yan; Guohao Dai; Yu Wang

arXiv:2502.11897·cs.CV·April 3, 2025

DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation

Zhihang Yuan, Siyuan Wang, Rui Xie, Hanling Zhang, Tongcheng Fang,, Yuzhang Shang, Shengen Yan, Guohao Dai, Yu Wang

PDF

Open Access 1 Repo

TL;DR

DLFR-VAE introduces a training-free, adaptive approach to video generation that dynamically adjusts latent frame rates based on content complexity, improving efficiency and integration with existing models.

Contribution

It presents a novel, training-free method that transforms pretrained VAE models into dynamic, content-aware video generators with adaptive temporal compression.

Findings

01

Enables content-dependent variable frame rates in video generation.

02

Accelerates video processing by adapting to scene complexity.

03

Seamlessly integrates with existing models as a plug-and-play module.

Abstract

In this paper, we propose the Dynamic Latent Frame Rate VAE (DLFR-VAE), a training-free paradigm that can make use of adaptive temporal compression in latent space. While existing video generative models apply fixed compression rates via pretrained VAE, we observe that real-world video content exhibits substantial temporal non-uniformity, with high-motion segments containing more information than static scenes. Based on this insight, DLFR-VAE dynamically adjusts the latent frame rate according to the content complexity. Specifically, DLFR-VAE comprises two core innovations: (1) A Dynamic Latent Frame Rate Scheduler that partitions videos into temporal chunks and adaptively determines optimal frame rates based on information-theoretic content complexity, and (2) A training-free adaptation mechanism that transforms pretrained VAE architectures into a dynamic VAE that can process features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thu-nics/dlfr-vae
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Advanced Vision and Imaging