EVATok: Adaptive Length Video Tokenization for Efficient Visual Autoregressive Generation
Tianwei Xiong, Jun Hao Liew, Zilong Huang, Zhijie Lin, Jiashi Feng, Xihui Liu

TL;DR
EVATok introduces an adaptive video tokenization framework that optimizes token assignment per video, significantly improving efficiency and quality in autoregressive video generation.
Contribution
It proposes a novel adaptive tokenization method with lightweight routers for optimal token assignment, enhancing efficiency and quality over fixed-length approaches.
Findings
Achieves at least 24.4% token savings compared to prior methods.
Delivers superior reconstruction quality and state-of-the-art class-to-video generation on UCF-101.
Demonstrates substantial efficiency improvements in video autoregressive models.
Abstract
Autoregressive (AR) video generative models rely on video tokenizers that compress pixels into discrete token sequences. The length of these token sequences is crucial for balancing reconstruction quality against downstream generation computational cost. Traditional video tokenizers apply a uniform token assignment across temporal blocks of different videos, often wasting tokens on simple, static, or repetitive segments while underserving dynamic or complex ones. To address this inefficiency, we introduce , a framework to produce fficient ideo daptive enizers. Our framework estimates optimal token assignments for each video to achieve the best quality-cost trade-off, develops lightweight routers for fast prediction of these optimal assignments, and trains adaptive tokenizers that encode videos based on the assignments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Video Coding and Compression Technologies
