Best Arm Identification with LLM Judges and Limited Human

Ruicheng Ao; Hongyu Chen; Siyang Gao; Hanwei Li; David Simchi-Levi

arXiv:2601.21471·cs.LG·January 30, 2026

Best Arm Identification with LLM Judges and Limited Human

Ruicheng Ao, Hongyu Chen, Siyang Gao, Hanwei Li, David Simchi-Levi

PDF

Open Access

TL;DR

This paper introduces a bias-aware method for best-arm identification using biased LLM judges and limited human audits, achieving efficient resource allocation and near-oracle performance.

Contribution

It develops a bias-corrected estimator and an adaptive algorithm for best-arm identification with biased proxies and limited ground truth audits.

Findings

01

The proposed estimator effectively corrects bias in proxy scores.

02

The adaptive algorithm concentrates audits on uncertain and close arms.

03

Numerical results show superior empirical performance and theoretical guarantees.

Abstract

We study fixed-confidence best-arm identification (BAI) where a cheap but potentially biased proxy (e.g., LLM judge) is available for every sample, while an expensive ground-truth label can only be acquired selectively when using a human for auditing. Unlike classical multi-fidelity BAI, the proxy is biased (arm- and context-dependent) and ground truth is selectively observed. Consequently, standard multi-fidelity methods can mis-select the best arm, and uniform auditing, though accurate, wastes scarce resources and is inefficient. We prove that without bias correction and propensity adjustment, mis-selection probability may not vanish (even with unlimited proxy data). We then develop an estimator for the mean of each arm that combines proxy scores with inverse-propensity-weighted residuals and form anytime-valid confidence sequences for that estimator. Based on the estimator and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Machine Learning and Algorithms · Adversarial Robustness in Machine Learning