Perception-Oriented Latent Coding for High-Performance Compressed Domain Semantic Inference
Xu Zhang, Ming Lu, Yan Chen, Zhan Ma

TL;DR
This paper introduces Perception-Oriented Latent Coding (POLC), a novel approach that enhances semantic richness in compressed domain inference, enabling high-performance vision tasks with minimal fine-tuning and reduced computational costs.
Contribution
POLC enriches latent features for better semantic inference in compressed images, requiring only a plug-and-play adapter for fine-tuning, unlike traditional MSE-based methods.
Findings
POLC achieves state-of-the-art rate-perception performance.
POLC significantly improves vision task accuracy in the compressed domain.
Minimal fine-tuning is needed for high performance.
Abstract
In recent years, compressed domain semantic inference has primarily relied on learned image coding models optimized for mean squared error (MSE). However, MSE-oriented optimization tends to yield latent spaces with limited semantic richness, which hinders effective semantic inference in downstream tasks. Moreover, achieving high performance with these models often requires fine-tuning the entire vision model, which is computationally intensive, especially for large models. To address these problems, we introduce Perception-Oriented Latent Coding (POLC), an approach that enriches the semantic content of latent features for high-performance compressed domain semantic inference. With the semantically rich latent space, POLC requires only a plug-and-play adapter for fine-tuning, significantly reducing the parameter count compared to previous MSE-oriented methods. Experimental results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Face recognition and analysis
MethodsAdapter
