Tell Codec What Worth Compressing: Semantically Disentangled Image   Coding for Machine with LMMs

Jinming Liu; Yuntao Wei; Junyan Lin; Shengyang Zhao; Heming Sun; Zhibo; Chen; Wenjun Zeng; Xin Jin

arXiv:2408.08575·cs.CV·August 19, 2024

Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMs

Jinming Liu, Yuntao Wei, Junyan Lin, Shengyang Zhao, Heming Sun, Zhibo, Chen, Wenjun Zeng, Xin Jin

PDF

Open Access

TL;DR

This paper introduces SDComp, a novel image compression method that leverages Large Multimodal Models to encode images based on semantic importance, optimizing for machine understanding rather than human perception.

Contribution

The paper proposes a new semantic-aware image coding framework that uses LMMs to inform compression based on downstream task relevance, a significant shift from traditional methods.

Findings

01

Supports diverse vision tasks with structured bitstreams

02

Achieves better task performance compared to state-of-the-art codecs

03

Provides flexible and semantically meaningful image reconstructions

Abstract

We present a new image compression paradigm to achieve ``intelligently coding for machine'' by cleverly leveraging the common sense of Large Multimodal Models (LMMs). We are motivated by the evidence that large language/multimodal models are powerful general-purpose semantics predictors for understanding the real world. Different from traditional image compression typically optimized for human eyes, the image coding for machines (ICM) framework we focus on requires the compressed bitstream to more comply with different downstream intelligent analysis tasks. To this end, we employ LMM to \textcolor{red}{tell codec what to compress}: 1) first utilize the powerful semantic understanding capability of LMMs w.r.t object grounding, identification, and importance ranking via prompts, to disentangle image content before compression, 2) and then based on these semantic priors we accordingly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis

MethodsFocus