Fault-Tolerant Evaluation for Sample-Efficient Model Performance Estimators
Zihan Zhu, Yanqiu Wu, Qiongkai Xu

TL;DR
This paper introduces a fault-tolerant evaluation framework for performance estimators that accounts for bias and variance within adjustable error margins, improving reliability in low-variance settings.
Contribution
It proposes a novel evaluation method that integrates bias and variance considerations with an automatic calibration algorithm, enhancing sample-efficient model performance assessment.
Findings
Framework effectively distinguishes bias and variance effects.
Automatic epsilon calibration improves evaluation reliability.
Experiments show practical utility on real-world datasets.
Abstract
In the era of Model-as-a-Service, organizations increasingly rely on third-party AI models for rapid deployment. However, the dynamic nature of emerging AI applications, the continual introduction of new datasets, and the growing number of models claiming superior performance make efficient and reliable validation of model services increasingly challenging. This motivates the development of sample-efficient performance estimators, which aim to estimate model performance by strategically selecting instances for labeling, thereby reducing annotation cost. Yet existing evaluation approaches often fail in low-variance settings: RMSE conflates bias and variance, masking persistent bias when variance is small, while p-value based tests become hypersensitive, rejecting adequate estimators for negligible deviations. To address this, we propose a fault-tolerant evaluation framework that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Data Quality and Management · Adversarial Robustness in Machine Learning
