Fault-Tolerant Evaluation for Sample-Efficient Model Performance Estimators

Zihan Zhu; Yanqiu Wu; Qiongkai Xu

arXiv:2602.07226·cs.LG·February 10, 2026

Fault-Tolerant Evaluation for Sample-Efficient Model Performance Estimators

Zihan Zhu, Yanqiu Wu, Qiongkai Xu

PDF

Open Access

TL;DR

This paper introduces a fault-tolerant evaluation framework for performance estimators that accounts for bias and variance within adjustable error margins, improving reliability in low-variance settings.

Contribution

It proposes a novel evaluation method that integrates bias and variance considerations with an automatic calibration algorithm, enhancing sample-efficient model performance assessment.

Findings

01

Framework effectively distinguishes bias and variance effects.

02

Automatic epsilon calibration improves evaluation reliability.

03

Experiments show practical utility on real-world datasets.

Abstract

In the era of Model-as-a-Service, organizations increasingly rely on third-party AI models for rapid deployment. However, the dynamic nature of emerging AI applications, the continual introduction of new datasets, and the growing number of models claiming superior performance make efficient and reliable validation of model services increasingly challenging. This motivates the development of sample-efficient performance estimators, which aim to estimate model performance by strategically selecting instances for labeling, thereby reducing annotation cost. Yet existing evaluation approaches often fail in low-variance settings: RMSE conflates bias and variance, masking persistent bias when variance is small, while p-value based tests become hypersensitive, rejecting adequate estimators for negligible deviations. To address this, we propose a fault-tolerant evaluation framework that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability · Data Quality and Management · Adversarial Robustness in Machine Learning