SMIC: Semantic Multi-Item Compression based on CLIP dictionary
Tom Bachard, Thomas Maugey

TL;DR
This paper introduces SMIC, a semantic multi-item compression method leveraging CLIP's latent space to efficiently compress image collections by exploiting inter-item redundancy while maintaining semantic fidelity.
Contribution
It extends semantic compression to image collections using a CLIP-based dictionary, outperforming existing generative codecs in compression rate without losing semantic quality.
Findings
Achieves around 10^-5 BPP per image compression rate.
Dictionary-based codec outperforms state-of-the-art generative codecs.
Dictionary acts as a semantic projector for image content.
Abstract
Semantic compression, a compression scheme where the distortion metric, typically MSE, is replaced with semantic fidelity metrics, tends to become more and more popular. Most recent semantic compression schemes rely on the foundation model CLIP. In this work, we extend such a scheme to image collection compression, where inter-item redundancy is taken into account during the coding phase. For that purpose, we first show that CLIP's latent space allows for easy semantic additions and subtractions. From this property, we define a dictionary-based multi-item codec that outperforms state-of-the-art generative codec in terms of compression rate, around BPP per image, while not sacrificing semantic fidelity. We also show that the learned dictionary is of a semantic nature and works as a semantic projector for the semantic content of images.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Algorithms and Data Compression · Semantic Web and Ontologies
MethodsContrastive Language-Image Pre-training
