Generalised Probabilistic Modelling and Improved Uncertainty Estimation in Comparative LLM-as-a-judge

Yassir Fathullah; Mark J. F. Gales

arXiv:2505.15240·cs.AI·May 22, 2025

Generalised Probabilistic Modelling and Improved Uncertainty Estimation in Comparative LLM-as-a-judge

Yassir Fathullah, Mark J. F. Gales

PDF

Open Access

TL;DR

This paper advances probabilistic modelling in LLM-based judging by broadening the framework, enhancing uncertainty estimates, and reducing the number of comparisons needed for reliable rankings.

Contribution

It introduces a generalized probabilistic framework, improved uncertainty estimation methods, and demonstrates efficiency gains in comparative LLM evaluations.

Findings

01

Uncertainty estimates significantly reduce comparison count by ~50%.

02

Combining absolute and comparative scores improves ranking performance.

03

Ranking uncertainty metrics help identify low-quality predictions.

Abstract

This paper explores generalised probabilistic modelling and uncertainty estimation in comparative LLM-as-a-judge frameworks. We show that existing Product-of-Experts methods are specific cases of a broader framework, enabling diverse modelling options. Furthermore, we propose improved uncertainty estimates for individual comparisons, enabling more efficient selection and achieving strong performance with fewer evaluations. We also introduce a method for estimating overall ranking uncertainty. Finally, we demonstrate that combining absolute and comparative scoring improves performance. Experiments show that the specific expert model has a limited impact on final rankings but our proposed uncertainty estimates, especially the probability of reordering, significantly improve the efficiency of systems reducing the number of needed comparisons by ~50%. Furthermore, ranking-level uncertainty…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques