Finding the Subjective Truth: Collecting 2 Million Votes for   Comprehensive Gen-AI Model Evaluation

Dimitrios Christodoulou; Mads Kuhlmann-J{\o}rgensen

arXiv:2409.11904·cs.CV·October 16, 2024

Finding the Subjective Truth: Collecting 2 Million Votes for Comprehensive Gen-AI Model Evaluation

Dimitrios Christodoulou, Mads Kuhlmann-J{\o}rgensen

PDF

Open Access 3 Datasets

TL;DR

This paper introduces a large-scale, diverse human annotation framework for evaluating text-to-image models, enabling comprehensive and bias-mitigated ranking of model performance based on subjective criteria.

Contribution

It presents an efficient annotation method leveraging global human feedback, collecting over 2 million votes to evaluate multiple models on subjective aspects.

Findings

01

Successful collection of 2 million annotations from diverse annotators

02

Effective ranking of models based on subjective criteria

03

Reduced bias through demographic diversity

Abstract

Efficiently evaluating the performance of text-to-image models is difficult as it inherently requires subjective judgment and human preference, making it hard to compare different models and quantify the state of the art. Leveraging Rapidata's technology, we present an efficient annotation framework that sources human feedback from a diverse, global pool of annotators. Our study collected over 2 million annotations across 4,512 images, evaluating four prominent models (DALL-E 3, Flux.1, MidJourney, and Stable Diffusion) on style preference, coherence, and text-to-image alignment. We demonstrate that our approach makes it feasible to comprehensively rank image generation models based on a vast pool of annotators and show that the diverse annotator demographics reflect the world population, significantly decreasing the risk of biases.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI)