M3-CVC: Controllable Video Compression with Multimodal Generative Models
Rui Wan, Qi Zheng, Yibo Fan

TL;DR
M3-CVC introduces a controllable video compression framework using multimodal generative models, significantly improving ultra-low-bitrate video quality by leveraging semantic, motion, and textual guidance for high-fidelity reconstruction.
Contribution
The paper presents a novel controllable video compression method that integrates multimodal generative models, hierarchical spatiotemporal feature extraction, and diffusion-based text-guided keyframe compression.
Findings
Outperforms VVC in ultra-low bitrate scenarios
Preserves semantic and perceptual fidelity effectively
Utilizes text-guided diffusion for high-quality frame reconstruction
Abstract
Traditional and neural video codecs commonly encounter limitations in controllability and generality under ultra-low-bitrate coding scenarios. To overcome these challenges, we propose M3-CVC, a controllable video compression framework incorporating multimodal generative models. The framework utilizes a semantic-motion composite strategy for keyframe selection to retain critical information. For each keyframe and its corresponding video clip, a dialogue-based large multimodal model (LMM) approach extracts hierarchical spatiotemporal details, enabling both inter-frame and intra-frame representations for improved video fidelity while enhancing encoding interpretability. M3-CVC further employs a conditional diffusion-based, text-guided keyframe compression method, achieving high fidelity in frame reconstruction. During decoding, textual descriptions derived from LMMs guide the diffusion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Video Coding and Compression Technologies
MethodsDiffusion
