Robust LLM Performance Certification via Constrained Maximum Likelihood Estimation
Minghe Shen, Ananth Balashankar, Adam Fisch, David Madras, Miguel Rodrigues

TL;DR
This paper introduces a constrained maximum likelihood estimation method for accurately estimating LLM failure rates by combining human labels, judge annotations, and domain constraints, outperforming existing approaches.
Contribution
The paper presents a novel, practical constrained MLE approach that integrates multiple signals and domain knowledge for more accurate LLM failure rate estimation.
Findings
Constrained MLE outperforms state-of-the-art baselines across various settings.
The method provides more accurate and lower-variance failure rate estimates.
Empirical validation demonstrates robustness across diverse experimental regimes.
Abstract
The ability to rigorously estimate the failure rates of large language models (LLMs) is a prerequisite for their safe deployment. Currently, however, practitioners often face a tradeoff between expensive human gold standards and potentially severely-biased automatic annotation schemes such as "LLM-as-a-Judge" labeling. In this paper, we propose a new, practical, and efficient approach to LLM failure rate estimation based on constrained maximum-likelihood estimation (MLE). Our method integrates three distinct signal sources: (i) a small, high-quality human-labeled calibration set, (ii) a large corpus of LLM-judge annotations, and, most importantly, (iii) additional side information via domain-specific constraints derived from known bounds on judge performance statistics. We validate our approach through a comprehensive empirical study, benchmarking it against state-of-the-art baselines…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
