TL;DR
MultiPPI is a flexible framework that optimally combines diverse data sources for statistical estimation, demonstrating theoretical guarantees and superior performance in large language model evaluations.
Contribution
It introduces MultiPPI, a novel method with theoretical guarantees for resource allocation across multiple predictors, improving estimation accuracy.
Findings
MultiPPI achieves lower estimation error than baselines.
It provides theoretical guarantees including minimax optimality and asymptotic normality.
Experimental results across LLM evaluation scenarios validate its effectiveness.
Abstract
Statistical estimation often involves tradeoffs between expensive, high-quality measurements and a variety of lower-quality proxies. We introduce Multiple-Prediction-Powered Inference (MultiPPI): a general framework for constructing statistically efficient estimates by optimally allocating resources across these diverse data sources. This work provides theoretical guarantees about the minimax optimality, finite-sample performance, and asymptotic normality of the MultiPPI estimator. Through experiments across three diverse large language model (LLM) evaluation scenarios, we show that MultiPPI consistently achieves lower estimation error than existing baselines. This advantage stems from its budget-adaptive allocation strategy, which strategically combines subsets of models by learning their complex cost and correlation structures.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
