ELSPR: Evaluator LLM Training Data Self-Purification on Non-Transitive Preferences via Tournament Graph Reconstruction

Yan Yu; Yilun Liu; Minggui He; Shimin Tao; Weibin Meng; Xinhua Yang; Li Zhang; Hongxia Ma; Dengye Li; Daimeng Wei; Boxing Chen; Fuliang Li

arXiv:2505.17691·cs.CL·December 3, 2025

ELSPR: Evaluator LLM Training Data Self-Purification on Non-Transitive Preferences via Tournament Graph Reconstruction

Yan Yu, Yilun Liu, Minggui He, Shimin Tao, Weibin Meng, Xinhua Yang, Li Zhang, Hongxia Ma, Dengye Li, Daimeng Wei, Boxing Chen, Fuliang Li

PDF

1 Video

TL;DR

ELSPR introduces a graph-theoretic framework to identify and filter out ambiguous, non-transitive preference data in LLM evaluation, significantly improving ranking reliability and model discriminative power.

Contribution

This paper presents ELSPR, a novel method that models pairwise preferences as tournament graphs and systematically filters problematic data to enhance LLM evaluation consistency.

Findings

01

13.8% reduction in non-transitivity

02

0.088 decrease in structural entropy

03

Improved human and model evaluation agreement

Abstract

Pairwise evaluation of large language models (LLMs) has become the dominant paradigm for benchmarking open-ended tasks, yet non-transitive preferences, where evaluators prefer A over B, B over C, but C over A, fundamentally undermine ranking reliability. We show that this critical issue stems largely from low-quality data that contains inherently ambiguous preference pairs. To address this challenge, we propose ELSPR, a principled graph-theoretic framework that models pairwise preferences as tournament graphs and systematically identifies problematic training data. ELSPR quantifies non-transitivity through strongly connected components (SCCs) analysis and measures overall preference clarity using a novel normalized directed graph structural entropy metric. Our filtering methodology selectively removes preference data that induce non-transitivity while preserving transitive preferences.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

ELSPR: Evaluator LLM Training Data Self-Purification on Non-Transitive Preferences via Tournament Graph Reconstruction· underline

Taxonomy

MethodsALIGN