Improving Your Model Ranking on Chatbot Arena by Vote Rigging

Rui Min; Tianyu Pang; Chao Du; Qian Liu; Minhao Cheng; Min Lin

arXiv:2501.17858·cs.CL·August 12, 2025

Improving Your Model Ranking on Chatbot Arena by Vote Rigging

Rui Min, Tianyu Pang, Chao Du, Qian Liu, Minhao Cheng, Min Lin

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper demonstrates that the Chatbot Arena's ranking system can be manipulated through vote rigging strategies, especially by exploiting the Elo rating mechanism, raising concerns about the reliability of such leaderboards.

Contribution

The paper introduces both target-only and omnipresent rigging strategies that can effectively manipulate chatbot rankings, highlighting vulnerabilities in the Elo-based rating system.

Findings

01

Rigging can improve model rankings with only hundreds of votes.

02

Target-only rigging is inefficient due to low battle involvement.

03

Elo rating system can be exploited even without direct involvement.

Abstract

Chatbot Arena is a popular platform for evaluating LLMs by pairwise battles, where users vote for their preferred response from two randomly sampled anonymous models. While Chatbot Arena is widely regarded as a reliable LLM ranking leaderboard, we show that crowdsourced voting can be rigged to improve (or decrease) the ranking of a target model $m_{t}$ . We first introduce a straightforward target-only rigging strategy that focuses on new battles involving $m_{t}$ , identifying it via watermarking or a binary classifier, and exclusively voting for $m_{t}$ wins. However, this strategy is practically inefficient because there are over $190$ models on Chatbot Arena and on average only about $1%$ of new battles will involve $m_{t}$ . To overcome this, we propose omnipresent rigging strategies, exploiting the Elo rating mechanism of Chatbot Arena that any new vote on a battle can influence the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sail-sg/rigging-chatbotarena
pytorchOfficial

Videos

Improving Your Model Ranking on Chatbot Arena by Vote Rigging· slideslive

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Topic Modeling · Spam and Phishing Detection