Adaptive Prediction-Powered AutoEval with Reliability and Efficiency Guarantees

Sangwoo Park; Matteo Zecchin; Osvaldo Simeone

arXiv:2505.18659·stat.ML·December 3, 2025

Adaptive Prediction-Powered AutoEval with Reliability and Efficiency Guarantees

Sangwoo Park, Matteo Zecchin, Osvaldo Simeone

PDF

1 Repo 1 Video

TL;DR

The paper introduces R-AutoEval+, an adaptive framework that guarantees reliable AI model evaluation with improved sample efficiency by dynamically balancing synthetic and real data reliance, validated across multiple LLM tasks.

Contribution

It proposes a novel adaptive evaluation method with finite-sample guarantees that outperforms traditional approaches by adjusting reliance on synthetic data based on autoevaluator accuracy.

Findings

01

R-AutoEval+ provides reliable model evaluation with finite-sample guarantees.

02

The framework improves sample efficiency over conventional methods.

03

Experiments confirm effectiveness across various LLM evaluation tasks.

Abstract

Selecting artificial intelligence (AI) models, such as large language models (LLMs), from multiple candidates requires accurate performance estimation. This is ideally achieved through empirical evaluations involving abundant real-world data. However, such evaluations are costly and impractical at scale. To address this challenge, autoevaluation methods leverage synthetic data produced by automated evaluators, such as LLMs-as-judges, reducing variance but potentially introducing bias. Recent approaches have employed semi-supervised prediction-powered inference (PPI) to correct for the bias of autoevaluators. However, the use of autoevaluators may lead in practice to a degradation in sample efficiency compared to conventional methods using only real-world data. In this paper, we propose R-AutoEval+, a novel framework that provides finite-sample reliability guarantees on the model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kclip/r_autoeval_plus
noneOfficial

Videos

Adaptive Prediction-Powered AutoEval with Reliability and Efficiency Guarantees· slideslive