Multi-Modal Prompt Learning on Blind Image Quality Assessment
Wensheng Pan, Timin Gao, Yan Zhang, Runze Hu, Xiawu Zheng, Enwei, Zhang, Yuting Gao, Yutao Liu, Yunhang Shen, Ke Li, Shengchuan Zhang, Liujuan, Cao, Rongrong Ji

TL;DR
This paper presents a multi-modal prompt learning approach that enhances blind image quality assessment by effectively leveraging visual and linguistic semantic information, leading to superior performance across multiple datasets.
Contribution
It introduces a novel multi-modal prompt-based method with multi-layer visual prompts and dual textual prompts to improve IQA accuracy over existing models.
Findings
Achieves SRCC of 0.961 on CSIQ dataset.
Attains SRCC of 0.941 on KADID dataset.
Outperforms existing BIQA methods in robustness and accuracy.
Abstract
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly. Currently, leveraging semantic information to enhance IQA is a crucial research direction. Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness. However, the generalist nature of these pre-trained Vision-Language (VL) models often renders them suboptimal for IQA-specific tasks. Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings. Existing prompt-based VL models overly focus on incremental semantic information from text, neglecting the rich insights available from visual data analysis. This imbalance limits their performance improvements in IQA tasks. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection · Advanced Image Fusion Techniques · Image Processing Techniques and Applications
MethodsFocus · Contrastive Language-Image Pre-training
