TL;DR
This paper introduces SC-AGIQA, a novel framework that improves AI-generated image quality assessment by addressing semantic misalignment and detail perception issues using text-visual constraints and frequency analysis.
Contribution
The paper proposes a unified assessment framework with two modules: TSAM for semantic alignment and FFDPM for fine-grained distortion detection, enhancing evaluation accuracy.
Findings
Outperforms existing methods on benchmark datasets
Effectively measures text-image semantic consistency
Accurately detects subtle visual distortions
Abstract
With the rapid advancements in Artificial Intelligence Generated Image (AGI) technology, the accurate assessment of their quality has become an increasingly vital requirement. Prevailing methods typically rely on cross-modal models like CLIP or BLIP to evaluate text-image alignment and visual quality. However, when applied to AGIs, these methods encounter two primary challenges: semantic misalignment and details perception missing. To address these limitations, we propose Text-Visual Semantic Constrained AI-Generated Image Quality Assessment (SC-AGIQA), a unified framework that leverages text-visual semantic constraints to significantly enhance the comprehensive evaluation of both text-image consistency and perceptual distortion in AI-generated images. Our approach integrates key capabilities from multiple models and tackles the aforementioned challenges by introducing two core modules:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsBLIP: Bootstrapping Language-Image Pre-training · Contrastive Language-Image Pre-training
