Unified Coding for Both Human Perception and Generalized Machine   Analytics with CLIP Supervision

Kangsheng Yin; Quan Liu; Xuelin Shen; Yulin He; Wenhan Yang; Shiqi; Wang

arXiv:2501.04579·cs.CV·January 9, 2025

Unified Coding for Both Human Perception and Generalized Machine Analytics with CLIP Supervision

Kangsheng Yin, Quan Liu, Xuelin Shen, Yulin He, Wenhan Yang, Shiqi, Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces UG-ICM, a unified image coding model that leverages CLIP supervision and adaptive decoding to support both human perception and machine analytics with a single, self-supervised bitstream, enhancing generalization and versatility.

Contribution

It proposes a novel unified coding framework using CLIP-based supervision and conditional decoding, enabling support for both human and machine vision tasks without task-specific training.

Findings

01

Achieves significant improvements in unseen machine analytics tasks.

02

Provides perceptually satisfying images for human viewers.

03

Supports dual-purpose decoding with a single bitstream.

Abstract

The image compression model has long struggled with adaptability and generalization, as the decoded bitstream typically serves only human or machine needs and fails to preserve information for unseen visual tasks. Therefore, this paper innovatively introduces supervision obtained from multimodal pre-training models and incorporates adaptive multi-objective optimization tailored to support both human visual perception and machine vision simultaneously with a single bitstream, denoted as Unified and Generalized Image Coding for Machine (UG-ICM). Specifically, to get rid of the reliance between compression models with downstream task supervision, we introduce Contrastive Language-Image Pre-training (CLIP) models into the training constraint for improved generalization. Global-to-instance-wise CLIP supervision is applied to help obtain hierarchical semantics that make models more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yinkangsheng/ug-icm
pytorchOfficial

Videos

Unified Coding for Both Human Perception and Generalized Machine Analytics with CLIP Supervision· underline

Taxonomy

TopicsNeural Networks and Applications · Digital Image Processing Techniques · CCD and CMOS Imaging Sensors

MethodsContrastive Language-Image Pre-training