Free-VSC: Free Semantics from Visual Foundation Models for Unsupervised Video Semantic Compression
Yuan Tian, Guo Lu, Guangtao Zhai

TL;DR
This paper introduces Free-VSC, a novel unsupervised video semantic compression method that leverages rich semantics from visual foundation models and a dynamic semantic trajectory scheme to improve compression efficiency and task performance.
Contribution
It proposes a semantic alignment layer with prompts for integrating foundation model semantics and a trajectory-based inter-frame compression scheme for better efficiency.
Findings
Outperforms previous methods on three tasks and six datasets.
Achieves better semantic preservation with reduced bitcost.
Effectively integrates foundation model semantics into video compression.
Abstract
Unsupervised video semantic compression (UVSC), i.e., compressing videos to better support various analysis tasks, has recently garnered attention. However, the semantic richness of previous methods remains limited, due to the single semantic learning objective, limited training data, etc. To address this, we propose to boost the UVSC task by absorbing the off-the-shelf rich semantics from VFMs. Specifically, we introduce a VFMs-shared semantic alignment layer, complemented by VFM-specific prompts, to flexibly align semantics between the compressed video and various VFMs. This allows different VFMs to collaboratively build a mutually-enhanced semantic space, guiding the learning of the compression model. Moreover, we introduce a dynamic trajectory-based inter-frame compression scheme, which first estimates the semantic trajectory based on the historical content, and then traverses along…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Video Analysis and Summarization · Video Coding and Compression Technologies
MethodsALIGN
