Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
Olivia Wiles, Chuhan Zhang, Isabela Albuquerque, Ivana Kaji\'c, Su, Wang, Emanuele Bugliarello, Yasumasa Onoe, Pinelopi Papalampidi, Ira Ktena,, Chris Knutsen, Cyrus Rashtchian, Anant Nawalgaria, Jordi Pont-Tuset, Aida, Nematzadeh

TL;DR
This paper systematically evaluates text-to-image models and metrics, introduces a skills-based benchmark for better model discrimination, and proposes a new auto-eval metric that aligns more closely with human judgments.
Contribution
It presents a comprehensive skills-based benchmark, extensive human rating data, and a novel QA-based auto-eval metric for improved T2I model evaluation.
Findings
The skills-based benchmark effectively discriminates models across different prompt complexities.
The collected human ratings reveal the impact of prompt ambiguity and model differences.
The new auto-eval metric correlates better with human ratings than existing metrics.
Abstract
While text-to-image (T2I) generative models have become ubiquitous, they do not necessarily generate images that align with a given prompt. While previous work has evaluated T2I alignment by proposing metrics, benchmarks, and templates for collecting human judgements, the quality of these components is not systematically measured. Human-rated prompt sets are generally small and the reliability of the ratings -- and thereby the prompt set used to compare models -- is not evaluated. We address this gap by performing an extensive study evaluating auto-eval metrics and human templates. We provide three main contributions: (1) We introduce a comprehensive skills-based benchmark that can discriminate models across different human templates. This skills-based benchmark categorises prompts into sub-skills, allowing a practitioner to pinpoint not only which skills are challenging, but at what…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpecies Distribution and Climate Change
MethodsSparse Evolutionary Training · ALIGN
