Multi-Modal Prompt Learning on Blind Image Quality Assessment

Wensheng Pan; Timin Gao; Yan Zhang; Runze Hu; Xiawu Zheng; Enwei; Zhang; Yuting Gao; Yutao Liu; Yunhang Shen; Ke Li; Shengchuan Zhang; Liujuan; Cao; Rongrong Ji

arXiv:2404.14949·cs.CV·May 21, 2024

Multi-Modal Prompt Learning on Blind Image Quality Assessment

Wensheng Pan, Timin Gao, Yan Zhang, Runze Hu, Xiawu Zheng, Enwei, Zhang, Yuting Gao, Yutao Liu, Yunhang Shen, Ke Li, Shengchuan Zhang, Liujuan, Cao, Rongrong Ji

PDF

Open Access 1 Repo

TL;DR

This paper presents a multi-modal prompt learning approach that enhances blind image quality assessment by effectively leveraging visual and linguistic semantic information, leading to superior performance across multiple datasets.

Contribution

It introduces a novel multi-modal prompt-based method with multi-layer visual prompts and dual textual prompts to improve IQA accuracy over existing models.

Findings

01

Achieves SRCC of 0.961 on CSIQ dataset.

02

Attains SRCC of 0.941 on KADID dataset.

03

Outperforms existing BIQA methods in robustness and accuracy.

Abstract

Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly. Currently, leveraging semantic information to enhance IQA is a crucial research direction. Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness. However, the generalist nature of these pre-trained Vision-Language (VL) models often renders them suboptimal for IQA-specific tasks. Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings. Existing prompt-based VL models overly focus on incremental semantic information from text, neglecting the rich insights available from visual data analysis. This imbalance limits their performance improvements in IQA tasks. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

stephencurry-web/IQA
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection · Advanced Image Fusion Techniques · Image Processing Techniques and Applications

MethodsFocus · Contrastive Language-Image Pre-training