GenAI Arena: An Open Evaluation Platform for Generative Models

Dongfu Jiang; Max Ku; Tianle Li; Yuansheng Ni; Shizhuo Sun; Rongqi; Fan; Wenhu Chen

arXiv:2406.04485·cs.AI·November 12, 2024·2 cites

GenAI Arena: An Open Evaluation Platform for Generative Models

Dongfu Jiang, Max Ku, Tianle Li, Yuansheng Ni, Shizhuo Sun, Rongqi, Fan, Wenhu Chen

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

GenAI-Arena is an open platform that crowdsources user evaluations to provide a more trustworthy and comprehensive assessment of image and video generative models, addressing limitations of existing automatic metrics.

Contribution

The paper introduces GenAI-Arena, a community-driven evaluation platform for generative models, and releases a benchmark dataset to improve model assessment methods.

Findings

01

Existing multimodal models lag in judging visual content quality.

02

GPT-4o achieves only 49.19% accuracy in mimicking human votes.

03

Open-source MLLMs perform worse due to limited reasoning abilities.

Abstract

Generative AI has made remarkable strides to revolutionize fields such as image and video generation. These advancements are driven by innovative algorithms, architecture, and data. However, the rapid proliferation of generative models has highlighted a critical gap: the absence of trustworthy evaluation metrics. Current automatic assessments such as FID, CLIP, FVD, etc often fail to capture the nuanced quality and user satisfaction associated with generative outputs. This paper proposes an open platform GenAI-Arena to evaluate different image and video generative models, where users can actively participate in evaluating these models. By leveraging collective user feedback and votes, GenAI-Arena aims to provide a more democratic and accurate measure of model performance. It covers three tasks of text-to-image generation, text-to-video generation, and image editing respectively.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

TIGER-AI-Lab/ImagenHub
pytorch

Datasets

TIGER-Lab/GenAI-Bench
dataset· 129 dl
129 dl

Videos

GenAI Arena: An Open Evaluation Platform for Generative Models· slideslive

Taxonomy

TopicsSemantic Web and Ontologies

MethodsContrastive Language-Image Pre-training