Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark
Rong-Cheng Tu, Zi-Ao Ma, Tian Lan, Yuehao Zhao, Heyan Huang, Xian-Ling, Mao

TL;DR
This paper introduces a task-decomposed evaluation framework for text-to-image generation, distills GPT-4o's evaluation ability into an open-source model, and creates a comprehensive benchmark, significantly improving automatic image quality assessment.
Contribution
It proposes a novel task decomposition framework, a distillation training strategy for open-source models, and a new meta-evaluation benchmark for automatic image quality evaluation.
Findings
Distilled open-source model outperforms GPT-4o-based baseline by 4.6% in correlation metrics.
Task decomposition reduces evaluation complexity and improves dataset quality.
The benchmark enables more reliable and comprehensive assessment of evaluation models.
Abstract
Driven by the remarkable progress in diffusion models, text-to-image generation has made significant strides, creating a pressing demand for automatic quality evaluation of generated images. Current state-of-the-art automatic evaluation methods heavily rely on Multi-modal Large Language Models (MLLMs), particularly powerful commercial models like GPT-4o. While these models are highly effective, their substantial costs limit scalability in large-scale evaluations. Adopting open-source MLLMs is an alternative; however, their performance falls short due to significant limitations in processing multi-modal data compared to commercial MLLMs. To tackle these problems, we first propose a task decomposition evaluation framework based on GPT-4o to automatically construct a new training dataset, where the complex evaluation task is decoupled into simpler sub-tasks, effectively reducing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsData Visualization and Analytics · Virtual Reality Applications and Impacts · Scientific Computing and Data Management
MethodsDiffusion
