Estimating Model Performance under Domain Shifts with Class-Specific   Confidence Scores

Zeju Li; Konstantinos Kamnitsas; Mobarakol Islam; Chen Chen; and Ben Glocker

arXiv:2207.09957·cs.CV·July 21, 2022

Estimating Model Performance under Domain Shifts with Class-Specific Confidence Scores

Zeju Li, Konstantinos Kamnitsas, Mobarakol Islam, Chen Chen, and Ben Glocker

PDF

Open Access 1 Repo

TL;DR

This paper introduces class-wise calibration techniques to improve performance estimation of pre-trained models under domain shifts, especially for imbalanced datasets, enhancing accuracy predictions in classification and segmentation tasks.

Contribution

The paper proposes class-specific modifications of confidence-based evaluation methods to better estimate model performance on imbalanced data under domain shifts.

Findings

01

Improved accuracy estimation by 18% in classification tasks.

02

Doubled estimation accuracy in image segmentation.

03

Effective handling of class imbalance in performance estimation.

Abstract

Machine learning models are typically deployed in a test setting that differs from the training setting, potentially leading to decreased model performance because of domain shift. If we could estimate the performance that a pre-trained model would achieve on data from a specific deployment setting, for example a certain clinic, we could judge whether the model could safely be deployed or if its performance degrades unacceptably on the specific data. Existing approaches estimate this based on the confidence of predictions made on unlabeled test data from the deployment's domain. We find existing methods struggle with data that present class imbalance, because the methods used to calibrate confidence do not account for bias induced by class imbalance, consequently failing to estimate class-wise accuracy. Here, we introduce class-wise calibration within the framework of performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zerojumpline/modelevaluationunderclassimbalance
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI · Machine Learning in Healthcare

MethodsTest