Towards optimal model evaluation: enhancing active testing with actively improved estimators

JooChul Lee; Likhitha Kolla; Jinbo Chen

PMC · DOI:10.1038/s41598-024-58633-3·May 9, 2024

Towards optimal model evaluation: enhancing active testing with actively improved estimators

JooChul Lee, Likhitha Kolla, Jinbo Chen

PDF

Open Access

TL;DR

This paper introduces new methods to improve model evaluation by reducing the need for fully labeled data.

Contribution

The paper proposes two novel estimators, AILUR and AIIPW, for active testing with improved accuracy and efficiency.

Findings

01

The proposed estimators outperform existing active testing methods across four real-world datasets.

02

The methods are robust to subsample size variations and reduce labeling costs effectively.

Abstract

With rapid advancements in machine learning and statistical models, ensuring the reliability of these models through accurate evaluation has become imperative. Traditional evaluation methods often rely on fully labeled test data, a requirement that is becoming increasingly impractical due to the growing size of datasets. In this work, we address this issue by extending existing work on active testing (AT) methods which are designed to sequentially sample and label data for evaluating pre-trained models. We propose two novel estimators: the Actively Improved Levelled Unbiased Risk (AILUR) and the Actively Improved Inverse Probability Weighting (AIIPW) estimators which are derived from nonparametric smoothing estimation. In addition, a model recalibration process is designed for the AIIPW estimator to optimize the sampling probability within the AT framework. We evaluate the proposed…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Cell lines2

10— Mus musculus (Mouse) · Hybridoma LUR— Homo sapiens (Human) · Induced pluripotent stem cell

Chemicals4

alcohol cocaine Semeron Fashion

Diseases4

fatty liver disease impulsivity NAFLD AT

Figures7

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Statistical Methods and Inference · Fault Detection and Control Systems