Flash-VAED: Plug-and-Play VAE Decoders for Efficient Video Generation

Lunjie Zhu; Yushi Huang; Xingtong Ge; Yufei Xue; Zhening Liu; Yumeng Zhang; Zehong Lin; Jun Zhang

arXiv:2602.19161·cs.CV·February 24, 2026

Flash-VAED: Plug-and-Play VAE Decoders for Efficient Video Generation

Lunjie Zhu, Yushi Huang, Xingtong Ge, Yufei Xue, Zhening Liu, Yumeng Zhang, Zehong Lin, Jun Zhang

PDF

Open Access

TL;DR

This paper introduces Flash-VAED, a universal framework for accelerating VAE decoders in video generation, reducing latency by 6 times while preserving high reconstruction quality through channel pruning, operator optimization, and distillation.

Contribution

The authors propose a novel acceleration framework for VAE decoders that maintains alignment with the original distribution, combining channel pruning, operator optimization, and distillation techniques.

Findings

01

Achieves approximately 6× speedup in VAE decoding.

02

Maintains up to 96.9% of original reconstruction quality.

03

Accelerates end-to-end video generation pipeline by 36%.

Abstract

Latent diffusion models have enabled high-quality video synthesis, yet their inference remains costly and time-consuming. As diffusion transformers become increasingly efficient, the latency bottleneck inevitably shifts to VAE decoders. To reduce their latency while maintaining quality, we propose a universal acceleration framework for VAE decoders that preserves full alignment with the original latent distribution. Specifically, we propose (1) an independence-aware channel pruning method to effectively mitigate severe channel redundancy, and (2) a stage-wise dominant operator optimization strategy to address the high inference cost of the widely used causal 3D convolutions in VAE decoders. Based on these innovations, we construct a Flash-VAED family. Moreover, we design a three-phase dynamic distillation framework that efficiently transfers the capabilities of the original VAE decoder…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Video Coding and Compression Technologies · Image Enhancement Techniques