M3-CVC: Controllable Video Compression with Multimodal Generative Models

Rui Wan; Qi Zheng; Yibo Fan

arXiv:2411.15798·eess.IV·December 30, 2024

M3-CVC: Controllable Video Compression with Multimodal Generative Models

Rui Wan, Qi Zheng, Yibo Fan

PDF

Open Access

TL;DR

M3-CVC introduces a controllable video compression framework using multimodal generative models, significantly improving ultra-low-bitrate video quality by leveraging semantic, motion, and textual guidance for high-fidelity reconstruction.

Contribution

The paper presents a novel controllable video compression method that integrates multimodal generative models, hierarchical spatiotemporal feature extraction, and diffusion-based text-guided keyframe compression.

Findings

01

Outperforms VVC in ultra-low bitrate scenarios

02

Preserves semantic and perceptual fidelity effectively

03

Utilizes text-guided diffusion for high-quality frame reconstruction

Abstract

Traditional and neural video codecs commonly encounter limitations in controllability and generality under ultra-low-bitrate coding scenarios. To overcome these challenges, we propose M3-CVC, a controllable video compression framework incorporating multimodal generative models. The framework utilizes a semantic-motion composite strategy for keyframe selection to retain critical information. For each keyframe and its corresponding video clip, a dialogue-based large multimodal model (LMM) approach extracts hierarchical spatiotemporal details, enabling both inter-frame and intra-frame representations for improved video fidelity while enhancing encoding interpretability. M3-CVC further employs a conditional diffusion-based, text-guided keyframe compression method, achieving high fidelity in frame reconstruction. During decoding, textual descriptions derived from LMMs guide the diffusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Video Coding and Compression Technologies

MethodsDiffusion