Beyond Cosine Similarity: Magnitude-Aware CLIP for No-Reference Image Quality Assessment
Zhicheng Liao, Dongxu Wu, Zhenshan Shi, Sijie Mai, Hanwei Zhu, Lingyu Zhu, Yuncheng Jiang, Baoliang Chen

TL;DR
This paper enhances CLIP-based no-reference image quality assessment by incorporating a magnitude-aware cue and adaptive fusion, significantly improving performance without additional training.
Contribution
Introduces a magnitude-aware auxiliary cue and confidence-guided fusion scheme to improve CLIP-based IQA performance.
Findings
Outperforms standard CLIP-based IQA methods on multiple benchmarks.
Effective normalization of CLIP features via Box-Cox transformation.
No task-specific training required for the proposed method.
Abstract
Recent efforts have repurposed the Contrastive Language-Image Pre-training (CLIP) model for No-Reference Image Quality Assessment (NR-IQA) by measuring the cosine similarity between the image embedding and textual prompts such as "a good photo" or "a bad photo." However, this semantic similarity overlooks a critical yet underexplored cue: the magnitude of the CLIP image features, which we empirically find to exhibit a strong correlation with perceptual quality. In this work, we introduce a novel adaptive fusion framework that complements cosine similarity with a magnitude-aware quality cue. Specifically, we first extract the absolute CLIP image features and apply a Box-Cox transformation to statistically normalize the feature distribution and mitigate semantic sensitivity. The resulting scalar summary serves as a semantically-normalized auxiliary cue that complements cosine-based prompt…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsImage and Video Quality Assessment · Visual Attention and Saliency Detection · Advanced Image Fusion Techniques
