Vision-Language Consistency Guided Multi-modal Prompt Learning for Blind AI Generated Image Quality Assessment
Jun Fu, Wei Zhou, Qiuping Jiang, Hantao Liu, Guangtao Zhai

TL;DR
This paper introduces a novel multi-modal prompt learning approach guided by vision-language consistency for blind AI generated image quality assessment, improving over existing methods by considering cross-modal alignment.
Contribution
It proposes a multi-modal prompt learning framework with vision-language consistency guidance specifically for AI generated image quality assessment, addressing limitations of uni-modal prompt tuning.
Findings
Outperforms state-of-the-art AGIQA models on public datasets.
Utilizes learnable prompts in both language and vision branches of CLIP.
Leverages vision-language alignment to improve quality prediction accuracy.
Abstract
Recently, textual prompt tuning has shown inspirational performance in adapting Contrastive Language-Image Pre-training (CLIP) models to natural image quality assessment. However, such uni-modal prompt learning method only tunes the language branch of CLIP models. This is not enough for adapting CLIP models to AI generated image quality assessment (AGIQA) since AGIs visually differ from natural images. In addition, the consistency between AGIs and user input text prompts, which correlates with the perceptual quality of AGIs, is not investigated to guide AGIQA. In this letter, we propose vision-language consistency guided multi-modal prompt learning for blind AGIQA, dubbed CLIP-AGIQA. Specifically, we introduce learnable textual and visual prompts in language and vision branches of CLIP models, respectively. Moreover, we design a text-to-image alignment quality prediction task, whose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning · Infrastructure Maintenance and Monitoring
MethodsContrastive Language-Image Pre-training
