EVATok: Adaptive Length Video Tokenization for Efficient Visual Autoregressive Generation

Tianwei Xiong; Jun Hao Liew; Zilong Huang; Zhijie Lin; Jiashi Feng; Xihui Liu

arXiv:2603.12267·cs.CV·March 13, 2026

EVATok: Adaptive Length Video Tokenization for Efficient Visual Autoregressive Generation

Tianwei Xiong, Jun Hao Liew, Zilong Huang, Zhijie Lin, Jiashi Feng, Xihui Liu

PDF

Open Access 1 Models

TL;DR

EVATok introduces an adaptive video tokenization framework that optimizes token assignment per video, significantly improving efficiency and quality in autoregressive video generation.

Contribution

It proposes a novel adaptive tokenization method with lightweight routers for optimal token assignment, enhancing efficiency and quality over fixed-length approaches.

Findings

01

Achieves at least 24.4% token savings compared to prior methods.

02

Delivers superior reconstruction quality and state-of-the-art class-to-video generation on UCF-101.

03

Demonstrates substantial efficiency improvements in video autoregressive models.

Abstract

Autoregressive (AR) video generative models rely on video tokenizers that compress pixels into discrete token sequences. The length of these token sequences is crucial for balancing reconstruction quality against downstream generation computational cost. Traditional video tokenizers apply a uniform token assignment across temporal blocks of different videos, often wasting tokens on simple, static, or repetitive segments while underserving dynamic or complex ones. To address this inefficiency, we introduce $EVATok$ , a framework to produce $E$ fficient $V$ ideo $A$ daptive $Tok$ enizers. Our framework estimates optimal token assignments for each video to achieve the best quality-cost trade-off, develops lightweight routers for fast prediction of these optimal assignments, and trains adaptive tokenizers that encode videos based on the assignments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
YuuTennYi/EVATok
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Video Coding and Compression Technologies