GenAI Arena: An Open Evaluation Platform for Generative Models
Dongfu Jiang, Max Ku, Tianle Li, Yuansheng Ni, Shizhuo Sun, Rongqi, Fan, Wenhu Chen

TL;DR
GenAI-Arena is an open platform that crowdsources user evaluations to provide a more trustworthy and comprehensive assessment of image and video generative models, addressing limitations of existing automatic metrics.
Contribution
The paper introduces GenAI-Arena, a community-driven evaluation platform for generative models, and releases a benchmark dataset to improve model assessment methods.
Findings
Existing multimodal models lag in judging visual content quality.
GPT-4o achieves only 49.19% accuracy in mimicking human votes.
Open-source MLLMs perform worse due to limited reasoning abilities.
Abstract
Generative AI has made remarkable strides to revolutionize fields such as image and video generation. These advancements are driven by innovative algorithms, architecture, and data. However, the rapid proliferation of generative models has highlighted a critical gap: the absence of trustworthy evaluation metrics. Current automatic assessments such as FID, CLIP, FVD, etc often fail to capture the nuanced quality and user satisfaction associated with generative outputs. This paper proposes an open platform GenAI-Arena to evaluate different image and video generative models, where users can actively participate in evaluating these models. By leveraging collective user feedback and votes, GenAI-Arena aims to provide a more democratic and accurate measure of model performance. It covers three tasks of text-to-image generation, text-to-video generation, and image editing respectively.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSemantic Web and Ontologies
MethodsContrastive Language-Image Pre-training
