A Unified Perturbation Framework for Analyzing Leaderboard Stability and Manipulation

Hosna Oyarhoseini; Jimmy Lin; Amir-Hossein Karimi

arXiv:2605.15761·cs.LG·May 18, 2026

A Unified Perturbation Framework for Analyzing Leaderboard Stability and Manipulation

Hosna Oyarhoseini, Jimmy Lin, Amir-Hossein Karimi

PDF

TL;DR

This paper introduces a unified framework to analyze the robustness of leaderboards in benchmarking large language models, revealing their vulnerability to small targeted data perturbations.

Contribution

It presents a novel influence-based perturbation framework for assessing and manipulating leaderboard rankings, highlighting their non-robustness and proposing more reliable evaluation methods.

Findings

01

Modern leaderboards are highly sensitive to minimal targeted perturbations.

02

Small data modifications can significantly alter top-k rankings and model confidence.

03

The framework enables efficient targeted manipulations to promote or demote models.

Abstract

Evaluation leaderboards such as LMArena play a central role in benchmarking large language models by aggregating pairwise human preferences into model rankings, yet the robustness of these rankings remains poorly understood. We present a unified perturbation framework for analyzing Bradley-Terry leaderboards under structured data modifications using influence-based approximations. Our framework studies three match-level perturbations -- Drop, Add, and Flip -- together with player removal, and evaluates their effects on top-k membership, global ranking consistency via Kendall's tau, and confidence-interval-based uncertainty. Across Chatbot Arena and six additional pairwise-comparison datasets, we show that modern leaderboards are non-robust across all three objectives: sub-1% targeted perturbations can change the top-ranked model, degrade Kendall's tau, and alter confidence intervals.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.