BPCLIP: A Bottom-up Image Quality Assessment from Distortion to Semantics Based on CLIP
Chenyue Song, Chen Hui, Wei Zhang, Haiqi Zhu, Shaohui Liu, Hong Huang, Feng Jiang

TL;DR
BPCLIP introduces a bottom-up image quality assessment method leveraging CLIP's joint image-text space, capturing the impact of distortions on semantics and outperforming existing IQA models on multiple benchmarks.
Contribution
It proposes a novel bottom-up IQA approach using CLIP with multiscale features and cross attention, integrating semantic quality descriptors for improved assessment.
Findings
Achieves superior performance on public IQA benchmarks.
Demonstrates robustness across different distortion types.
Effectively links low-level distortions to high-level semantics.
Abstract
Image Quality Assessment (IQA) aims to evaluate the perceptual quality of images based on human subjective perception. Existing methods generally combine multiscale features to achieve high performance, but most rely on straightforward linear fusion of these features, which may not adequately capture the impact of distortions on semantic content. To address this, we propose a bottom-up image quality assessment approach based on the Contrastive Language-Image Pre-training (CLIP, a recently proposed model that aligns images and text in a shared feature space), named BPCLIP, which progressively extracts the impact of low-level distortions on high-level semantics. Specifically, we utilize an encoder to extract multiscale features from the input image and introduce a bottom-up multiscale cross attention module designed to capture the relationships between shallow and deep features. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
