LLM-as-Judge on a Budget

Aadirupa Saha; Aniket Wagde; Branislav Kveton

arXiv:2602.15481·cs.LG·April 14, 2026

LLM-as-Judge on a Budget

Aadirupa Saha, Aniket Wagde, Branislav Kveton

PDF

Abstract

LLM-as-a-judge has emerged as a cornerstone technique for evaluating large language models by leveraging LLM reasoning to score prompt-response pairs. Since LLM judgments are stochastic, practitioners commonly query each pair multiple times to estimate mean scores accurately. This raises a critical challenge: given a fixed computational budget $B$ , how to optimally allocate queries across $K$ prompt-response pairs to minimize estimation error? We present a principled variance-adaptive approach leveraging multi-armed bandit theory and concentration inequalities. Our method dynamically allocates queries based on estimated score variances, concentrating resources where uncertainty is highest. Further, our algorithm is shown to achieve a worst-case score-estimation error of $\tilde{O} (\frac{\sum _{i = 1}^{K} σ _{i}^{2}}{B})$ , $σ_{i}^{2}$ being the unknown score variance for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.