Gaming the Arena: AI Model Evaluation and the Viral Capture of Attention
Sam Hind

TL;DR
This paper explores how AI model evaluation has shifted towards competitive 'arena' formats driven by a desire for attention, impacting AI innovation, commercialization, and community dynamics.
Contribution
It introduces the concept of 'arena-ization' in AI evaluation, analyzing its social and commercial implications through a technographic study of LMArena.
Findings
AI evaluation now resembles gladiatorial 'battles' to attract attention
The 'arena-ization' fuels AI commercialization and community engagement
Model developers engage in 'arena gaming' to capture attention and scale their models
Abstract
Innovation in artificial intelligence (AI) has always been dependent on technological infrastructures, from code repositories to computing hardware. Yet industry -- rather than universities -- has become increasingly influential in shaping AI innovation. As generative forms of AI powered by large language models (LLMs) have driven the breakout of AI into the wider world, the AI community has sought to develop new methods for independently evaluating the performance of AI models. How best, in other words, to compare the performance of AI models against other AI models -- and how best to account for new models launched on nearly a daily basis? Building on recent work in media studies, STS, and computer science on benchmarking and the practices of AI evaluation, I examine the rise of so-called 'arenas' in which AI models are evaluated with reference to gladiatorial-style 'battles'. Through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Computational and Text Analysis Methods · Artificial Intelligence in Healthcare and Education
