Cross Modal Compression: Towards Human-comprehensible Semantic   Compression

Jiguo Li; Chuanmin Jia; Xinfeng Zhang; Siwei Ma; Wen Gao

arXiv:2209.02574·eess.IV·September 7, 2022

Cross Modal Compression: Towards Human-comprehensible Semantic Compression

Jiguo Li, Chuanmin Jia, Xinfeng Zhang, Siwei Ma, Wen Gao

PDF

TL;DR

This paper introduces cross modal compression (CMC), a novel semantic compression framework that transforms visual data into human-understandable formats, achieving high compression ratios while preserving semantic content.

Contribution

The paper formulates CMC as a rate-distortion problem, compares it with traditional and feature compression, and demonstrates its effectiveness with qualitative and quantitative results.

Findings

01

CMC achieves higher compression ratios than JPEG.

02

CMC preserves semantic information effectively.

03

Qualitative and quantitative evaluations validate CMC's performance.

Abstract

Traditional image/video compression aims to reduce the transmission/storage cost with signal fidelity as high as possible. However, with the increasing demand for machine analysis and semantic monitoring in recent years, semantic fidelity rather than signal fidelity is becoming another emerging concern in image/video compression. With the recent advances in cross modal translation and generation, in this paper, we propose the cross modal compression~(CMC), a semantic compression framework for visual data, to transform the high redundant visual data~(such as image, video, etc.) into a compact, human-comprehensible domain~(such as text, sketch, semantic map, attributions, etc.), while preserving the semantic. Specifically, we first formulate the CMC problem as a rate-distortion optimization problem. Secondly, we investigate the relationship with the traditional image/video compression and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.