Adaptive 1D Video Diffusion Autoencoder

Yao Teng; Minxuan Lin; Xian Liu; Shuai Wang; Xiao Yang; Xihui Liu

arXiv:2602.04220·cs.CV·February 5, 2026

Adaptive 1D Video Diffusion Autoencoder

Yao Teng, Minxuan Lin, Xian Liu, Shuai Wang, Xiao Yang, Xihui Liu

PDF

Open Access

TL;DR

This paper introduces One-DVA, a transformer-based autoencoder for video that adaptively compresses and reconstructs videos using diffusion models, overcoming fixed-rate limitations and enhancing generative capabilities.

Contribution

The paper presents a novel adaptive 1D video autoencoder with transformer-based encoding and diffusion decoding, enabling variable-length compression and improved generative support.

Findings

01

Achieves comparable reconstruction quality to 3D-CNN VAEs at similar compression ratios.

02

Supports adaptive compression for higher ratios.

03

Regularizes latent distribution for better generative modeling.

Abstract

Recent video generation models largely rely on video autoencoders that compress pixel-space videos into latent representations. However, existing video autoencoders suffer from three major limitations: (1) fixed-rate compression that wastes tokens on simple videos, (2) inflexible CNN architectures that prevent variable-length latent modeling, and (3) deterministic decoders that struggle to recover appropriate details from compressed latents. To address these issues, we propose One-Dimensional Diffusion Video Autoencoder (One-DVA), a transformer-based framework for adaptive 1D encoding and diffusion-based decoding. The encoder employs query-based vision transformers to extract spatiotemporal features and produce latent representations, while a variable-length dropout mechanism dynamically adjusts the latent length. The decoder is a pixel-space diffusion transformer that reconstructs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Advanced Data Compression Techniques