Resi-VidTok: An Efficient and Decomposed Progressive Tokenization Framework for Ultra-Low-Rate and Lightweight Video Transmission
Zhenyu Liu, Yi Ma, Rahim Tafazolli, Zhi Ding

TL;DR
Resi-VidTok is a novel framework for ultra-low-rate, lightweight video transmission that ensures robustness, semantic fidelity, and real-time performance over constrained wireless channels by using importance-ordered token streams and adaptive coding.
Contribution
It introduces a resilient 1D tokenization pipeline with differential temporal coding and a channel-adaptive scheme, enabling reliable, low-bandwidth, real-time video transmission without heavy models.
Findings
Robust visual and semantic quality at bandwidth ratios as low as 0.0004.
Real-time reconstruction at over 30 fps.
Effective handling of severe channel conditions with graceful quality degradation.
Abstract
Real-time transmission of video over wireless networks remains highly challenging, even with advanced deep models, particularly under severe channel conditions such as limited bandwidth and weak connectivity. In this paper, we propose Resi-VidTok, a Resilient Tokenization-Enabled framework designed for ultra-low-rate and lightweight video transmission that delivers strong robustness while preserving perceptual and semantic fidelity on commodity digital hardware. By reorganizing spatio--temporal content into a discrete, importance-ordered token stream composed of key tokens and refinement tokens, Resi-VidTok enables progressive encoding, prefix-decodable reconstruction, and graceful quality degradation under constrained channels. A key contribution is a resilient 1D tokenization pipeline for video that integrates differential temporal token coding, explicitly supporting reliable recovery…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
