Vision-Language Consistency Guided Multi-modal Prompt Learning for Blind   AI Generated Image Quality Assessment

Jun Fu; Wei Zhou; Qiuping Jiang; Hantao Liu; Guangtao Zhai

arXiv:2406.16641·cs.CV·June 25, 2024·1 cites

Vision-Language Consistency Guided Multi-modal Prompt Learning for Blind AI Generated Image Quality Assessment

Jun Fu, Wei Zhou, Qiuping Jiang, Hantao Liu, Guangtao Zhai

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces a novel multi-modal prompt learning approach guided by vision-language consistency for blind AI generated image quality assessment, improving over existing methods by considering cross-modal alignment.

Contribution

It proposes a multi-modal prompt learning framework with vision-language consistency guidance specifically for AI generated image quality assessment, addressing limitations of uni-modal prompt tuning.

Findings

01

Outperforms state-of-the-art AGIQA models on public datasets.

02

Utilizes learnable prompts in both language and vision branches of CLIP.

03

Leverages vision-language alignment to improve quality prediction accuracy.

Abstract

Recently, textual prompt tuning has shown inspirational performance in adapting Contrastive Language-Image Pre-training (CLIP) models to natural image quality assessment. However, such uni-modal prompt learning method only tunes the language branch of CLIP models. This is not enough for adapting CLIP models to AI generated image quality assessment (AGIQA) since AGIs visually differ from natural images. In addition, the consistency between AGIs and user input text prompts, which correlates with the perceptual quality of AGIs, is not investigated to guide AGIQA. In this letter, we propose vision-language consistency guided multi-modal prompt learning for blind AGIQA, dubbed CLIP-AGIQA. Specifically, we introduce learnable textual and visual prompts in language and vision branches of CLIP models, respectively. Moreover, we design a text-to-image alignment quality prediction task, whose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

junfu1995/clip-agiqa
pytorchOfficial

Datasets

Kylan12/Synthetic-AI-ML-Dataset
dataset· 42 dl
42 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCOVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning · Infrastructure Maintenance and Monitoring

MethodsContrastive Language-Image Pre-training