Perception-Oriented Latent Coding for High-Performance Compressed Domain Semantic Inference

Xu Zhang; Ming Lu; Yan Chen; Zhan Ma

arXiv:2507.01608·cs.CV·July 3, 2025

Perception-Oriented Latent Coding for High-Performance Compressed Domain Semantic Inference

Xu Zhang, Ming Lu, Yan Chen, Zhan Ma

PDF

Open Access 1 Repo

TL;DR

This paper introduces Perception-Oriented Latent Coding (POLC), a novel approach that enhances semantic richness in compressed domain inference, enabling high-performance vision tasks with minimal fine-tuning and reduced computational costs.

Contribution

POLC enriches latent features for better semantic inference in compressed images, requiring only a plug-and-play adapter for fine-tuning, unlike traditional MSE-based methods.

Findings

01

POLC achieves state-of-the-art rate-perception performance.

02

POLC significantly improves vision task accuracy in the compressed domain.

03

Minimal fine-tuning is needed for high performance.

Abstract

In recent years, compressed domain semantic inference has primarily relied on learned image coding models optimized for mean squared error (MSE). However, MSE-oriented optimization tends to yield latent spaces with limited semantic richness, which hinders effective semantic inference in downstream tasks. Moreover, achieving high performance with these models often requires fine-tuning the entire vision model, which is computationally intensive, especially for large models. To address these problems, we introduce Perception-Oriented Latent Coding (POLC), an approach that enriches the semantic content of latent features for high-performance compressed domain semantic inference. With the semantically rich latent space, POLC requires only a plug-and-play adapter for fine-tuning, significantly reducing the parameter count compared to previous MSE-oriented methods. Experimental results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

NJUVISION/POLC
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Face recognition and analysis

MethodsAdapter