Flash-VAED: Plug-and-Play VAE Decoders for Efficient Video Generation
Lunjie Zhu, Yushi Huang, Xingtong Ge, Yufei Xue, Zhening Liu, Yumeng Zhang, Zehong Lin, Jun Zhang

TL;DR
This paper introduces Flash-VAED, a universal framework for accelerating VAE decoders in video generation, reducing latency by 6 times while preserving high reconstruction quality through channel pruning, operator optimization, and distillation.
Contribution
The authors propose a novel acceleration framework for VAE decoders that maintains alignment with the original distribution, combining channel pruning, operator optimization, and distillation techniques.
Findings
Achieves approximately 6× speedup in VAE decoding.
Maintains up to 96.9% of original reconstruction quality.
Accelerates end-to-end video generation pipeline by 36%.
Abstract
Latent diffusion models have enabled high-quality video synthesis, yet their inference remains costly and time-consuming. As diffusion transformers become increasingly efficient, the latency bottleneck inevitably shifts to VAE decoders. To reduce their latency while maintaining quality, we propose a universal acceleration framework for VAE decoders that preserves full alignment with the original latent distribution. Specifically, we propose (1) an independence-aware channel pruning method to effectively mitigate severe channel redundancy, and (2) a stage-wise dominant operator optimization strategy to address the high inference cost of the widely used causal 3D convolutions in VAE decoders. Based on these innovations, we construct a Flash-VAED family. Moreover, we design a three-phase dynamic distillation framework that efficiently transfers the capabilities of the original VAE decoder…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Video Coding and Compression Technologies · Image Enhancement Techniques
