ML-CLIPSim: Multi-Layer CLIP Similarity for Machine-Oriented Image Quality

Feng Ding; Haisheng Fu; Jie Liang; Qihan Xu; Siyu Zhu; Jingning Han

arXiv:2605.09479·eess.IV·May 12, 2026

ML-CLIPSim: Multi-Layer CLIP Similarity for Machine-Oriented Image Quality

Feng Ding, Haisheng Fu, Jie Liang, Qihan Xu, Siyu Zhu, Jingning Han

PDF

TL;DR

This paper introduces ML-CLIPSim, a new differentiable image quality metric designed for machine-centric evaluation, which outperforms traditional metrics in aligning with machine preferences and enhances downstream task performance.

Contribution

The paper proposes ML-CLIPSim, a novel multi-layer CLIP-based similarity metric for machine-oriented image quality assessment, and constructs PCMP, a dataset for evaluating model consistency.

Findings

01

ML-CLIPSim aligns better with machine preferences than traditional metrics.

02

Using ML-CLIPSim as a compression term improves rate--task trade-offs.

03

ML-CLIPSim remains competitive for human quality prediction.

Abstract

We study full-reference image quality assessment from a machine-centric perspective, where images are evaluated by how well they preserve information for downstream models. We formulate machine-oriented quality as a latent machine utility and approximate it through pairwise predictive-consistency comparisons. To this end, we construct PCMP, a dataset of PSNR-matched distortion pairs labeled by consistency votes from multiple pretrained models. We further propose ML-CLIPSim, a differentiable quality metric built on a frozen CLIP visual encoder, which aggregates intermediate patch-token similarities and global image embeddings. Experiments on machine-preference benchmarks, human-IQA datasets, and learned image compression show that ML-CLIPSim better aligns with machine-oriented preferences than conventional fidelity and perceptual metrics, while remaining competitive for human quality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.