SMC++: Masked Learning of Unsupervised Video Semantic Compression
Yuan Tian, Xiaoyue Ling, Cong Geng, Qiang Hu, Guo Lu, Guangtao Zhai

TL;DR
This paper introduces SMC++, a novel video compression framework that emphasizes preserving semantic information for downstream analysis, utilizing masked video modeling, entropy regularization, and Transformer-based modules.
Contribution
It proposes a self-supervised semantic-preserving compression method with a new masked motion prediction and Transformer-based compression, advancing semantic video coding.
Findings
Outperforms traditional and learnable codecs on multiple datasets
Enhances downstream video analysis tasks
Effectively preserves semantic content during compression
Abstract
Most video compression methods focus on human visual perception, neglecting semantic preservation. This leads to severe semantic loss during the compression, hampering downstream video analysis tasks. In this paper, we propose a Masked Video Modeling (MVM)-powered compression framework that particularly preserves video semantics, by jointly mining and compressing the semantics in a self-supervised manner. While MVM is proficient at learning generalizable semantics through the masked patch prediction task, it may also encode non-semantic information like trivial textural details, wasting bitcost and bringing semantic noises. To suppress this, we explicitly regularize the non-semantic entropy of the compressed video in the MVM token space. The proposed framework is instantiated as a simple Semantic-Mining-then-Compression (SMC) model. Furthermore, we extend SMC as an advanced SMC++ model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Video Analysis and Summarization
MethodsFocus · ALIGN
