TL;DR
M3-AGIQA is a comprehensive, multi-aspect evaluation framework for AI-generated images that uses multimodal large language models and multi-round analysis to produce human-aligned, interpretable quality scores.
Contribution
It introduces a novel multi-round, multi-aspect evaluation method leveraging MLLMs for holistic assessment of AI-generated images, addressing perceptual quality, prompt correspondence, and authenticity.
Findings
Achieves state-of-the-art performance on multiple benchmarks.
Demonstrates strong generalizability across datasets.
Provides interpretable, human-aligned quality scores.
Abstract
The rapid advancement of AI-generated image (AIGI) models presents new challenges for evaluating image quality, particularly across three aspects: perceptual quality, prompt correspondence, and authenticity. To address these challenges, we introduce M3-AGIQA, a comprehensive framework that leverages Multimodal Large Language Models (MLLMs) to enable more human-aligned, holistic evaluation of AI-generated images across both visual and textual domains. Besides, our framework features a structured multi-round evaluation process, generating and analyzing intermediate image descriptions to provide deeper insight into these three aspects. By aligning model outputs more closely with human judgment, M3-AGIQA delivers robust and interpretable quality scores. Extensive experiments on multiple benchmarks demonstrate that our method achieves state-of-the-art performance on tested datasets and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsALIGN
