WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent   Video Diffusion Model

Zongjian Li; Bin Lin; Yang Ye; Liuhan Chen; Xinhua Cheng; and Shenghai Yuan; Li Yuan

arXiv:2411.17459·cs.CV·April 14, 2025

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

Zongjian Li, Bin Lin, Yang Ye, Liuhan Chen, Xinhua Cheng, and Shenghai Yuan, Li Yuan

PDF

Open Access 2 Repos 1 Models

TL;DR

WF-VAE introduces a wavelet-based approach to improve video encoding efficiency in VAEs, enabling faster, lower-memory latent video diffusion with better quality and continuity for long videos.

Contribution

The paper proposes WF-VAE, a novel wavelet-driven VAE that enhances encoding efficiency and maintains latent space integrity during block-wise inference in long videos.

Findings

01

2x higher throughput compared to state-of-the-art

02

4x lower memory consumption

03

Maintains competitive reconstruction quality

Abstract

Video Variational Autoencoder (VAE) encodes videos into a low-dimensional latent space, becoming a key component of most Latent Video Diffusion Models (LVDMs) to reduce model training costs. However, as the resolution and duration of generated videos increase, the encoding cost of Video VAEs becomes a limiting bottleneck in training LVDMs. Moreover, the block-wise inference method adopted by most LVDMs can lead to discontinuities of latent space when processing long-duration videos. The key to addressing the computational bottleneck lies in decomposing videos into distinct components and efficiently encoding the critical information. Wavelet transform can decompose videos into multiple frequency-domain components and improve the efficiency significantly, we thus propose Wavelet Flow VAE (WF-VAE), an autoencoder that leverages multi-level wavelet transform to facilitate low-frequency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
chestnutlzj/WF-VAE-L-16Chn
model· 21 dl· ♡ 10
21 dl♡ 10

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis

MethodsDiffusion