Tokenizing Motion: A Generative Approach for Scene Dynamics Compression
Shanzhi Yin, Zihan Zhang, Bolin Chen, Shiqi Wang, Yan Ye

TL;DR
This paper introduces a generative video compression method that uses motion pattern priors for ultra-low bitrate transmission, enabling high-quality scene dynamics reconstruction across diverse content types.
Contribution
It presents a novel approach leveraging motion priors instead of content priors, with a dense-to-sparse encoding and flow-driven diffusion decoding for improved compression.
Findings
Outperforms state-of-the-art ECM in rate-distortion performance
Achieves high-quality reconstruction with ultra-low bitrate
Effective across diverse scene dynamics
Abstract
This paper proposes a novel generative video compression framework that leverages motion pattern priors, derived from subtle dynamics in common scenes (e.g., swaying flowers or a boat drifting on water), rather than relying on video content priors (e.g., talking faces or human bodies). These compact motion priors enable a new approach to ultra-low bitrate communication while achieving high-quality reconstruction across diverse scene contents. At the encoder side, motion priors can be streamlined into compact representations via a dense-to-sparse transformation. At the decoder side, these priors facilitate the reconstruction of scene dynamics using an advanced flow-driven diffusion model. Experimental results illustrate that the proposed method can achieve superior rate-distortion-performance and outperform the state-of-the-art conventional-video codec Enhanced Compression Model (ECM)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Data Visualization and Analytics
