Text-Visual Semantic Constrained AI-Generated Image Quality Assessment

Qiang Li; Qingsen Yan; Haojian Huang; Peng Wu; Haokui Zhang; Yanning Zhang

arXiv:2507.10432·cs.CV·July 17, 2025

Text-Visual Semantic Constrained AI-Generated Image Quality Assessment

Qiang Li, Qingsen Yan, Haojian Huang, Peng Wu, Haokui Zhang, Yanning Zhang

PDF

1 Repo

TL;DR

This paper introduces SC-AGIQA, a novel framework that improves AI-generated image quality assessment by addressing semantic misalignment and detail perception issues using text-visual constraints and frequency analysis.

Contribution

The paper proposes a unified assessment framework with two modules: TSAM for semantic alignment and FFDPM for fine-grained distortion detection, enhancing evaluation accuracy.

Findings

01

Outperforms existing methods on benchmark datasets

02

Effectively measures text-image semantic consistency

03

Accurately detects subtle visual distortions

Abstract

With the rapid advancements in Artificial Intelligence Generated Image (AGI) technology, the accurate assessment of their quality has become an increasingly vital requirement. Prevailing methods typically rely on cross-modal models like CLIP or BLIP to evaluate text-image alignment and visual quality. However, when applied to AGIs, these methods encounter two primary challenges: semantic misalignment and details perception missing. To address these limitations, we propose Text-Visual Semantic Constrained AI-Generated Image Quality Assessment (SC-AGIQA), a unified framework that leverages text-visual semantic constraints to significantly enhance the comprehensive evaluation of both text-image consistency and perceptual distortion in AI-generated images. Our approach integrates key capabilities from multiple models and tackles the aforementioned challenges by introducing two core modules:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mozhu1/SC-AGIQA
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsBLIP: Bootstrapping Language-Image Pre-training · Contrastive Language-Image Pre-training