OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video   Diffusion Model

Liuhan Chen; Zongjian Li; Bin Lin; Bin Zhu; Qian Wang and; Shenghai Yuan; Xing Zhou; Xinhua Cheng; Li Yuan

arXiv:2409.01199·cs.CV·September 10, 2024

OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model

Liuhan Chen, Zongjian Li, Bin Lin, Bin Zhu, Qian Wang and, Shenghai Yuan, Xing Zhou, Xinhua Cheng, Li Yuan

PDF

Open Access 1 Repo

TL;DR

This paper introduces OD-VAE, a novel omni-dimensional video compression VAE that compresses videos both spatially and temporally, significantly improving the efficiency of latent video diffusion models while maintaining high reconstruction quality.

Contribution

The paper proposes OD-VAE, a VAE that performs joint spatial-temporal compression of videos, along with four variants, a tail initialization, and an inference strategy for arbitrary-length videos.

Findings

01

OD-VAE achieves high reconstruction accuracy with efficient compression.

02

Four variants of OD-VAE offer different trade-offs between quality and speed.

03

Experiments show improved efficiency in latent video diffusion models.

Abstract

Variational Autoencoder (VAE), compressing videos into latent representations, is a crucial preceding component of Latent Video Diffusion Models (LVDMs). With the same reconstruction quality, the more sufficient the VAE's compression for videos is, the more efficient the LVDMs are. However, most LVDMs utilize 2D image VAE, whose compression for videos is only in the spatial dimension and often ignored in the temporal dimension. How to conduct temporal compression for videos in a VAE to obtain more concise latent representations while promising accurate reconstruction is seldom explored. To fill this gap, we propose an omni-dimension compression VAE, named OD-VAE, which can temporally and spatially compress videos. Although OD-VAE's more sufficient compression brings a great challenge to video reconstruction, it can still achieve high reconstructed accuracy by our fine design. To obtain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pku-yuangroup/open-sora-plan
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Coding and Compression Technologies · Video Analysis and Summarization · Generative Adversarial Networks and Image Synthesis

MethodsDiffusion