OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model
Liuhan Chen, Zongjian Li, Bin Lin, Bin Zhu, Qian Wang and, Shenghai Yuan, Xing Zhou, Xinhua Cheng, Li Yuan

TL;DR
This paper introduces OD-VAE, a novel omni-dimensional video compression VAE that compresses videos both spatially and temporally, significantly improving the efficiency of latent video diffusion models while maintaining high reconstruction quality.
Contribution
The paper proposes OD-VAE, a VAE that performs joint spatial-temporal compression of videos, along with four variants, a tail initialization, and an inference strategy for arbitrary-length videos.
Findings
OD-VAE achieves high reconstruction accuracy with efficient compression.
Four variants of OD-VAE offer different trade-offs between quality and speed.
Experiments show improved efficiency in latent video diffusion models.
Abstract
Variational Autoencoder (VAE), compressing videos into latent representations, is a crucial preceding component of Latent Video Diffusion Models (LVDMs). With the same reconstruction quality, the more sufficient the VAE's compression for videos is, the more efficient the LVDMs are. However, most LVDMs utilize 2D image VAE, whose compression for videos is only in the spatial dimension and often ignored in the temporal dimension. How to conduct temporal compression for videos in a VAE to obtain more concise latent representations while promising accurate reconstruction is seldom explored. To fill this gap, we propose an omni-dimension compression VAE, named OD-VAE, which can temporally and spatially compress videos. Although OD-VAE's more sufficient compression brings a great challenge to video reconstruction, it can still achieve high reconstructed accuracy by our fine design. To obtain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Coding and Compression Technologies · Video Analysis and Summarization · Generative Adversarial Networks and Image Synthesis
MethodsDiffusion
