Calibrating for Class Weights by Modeling Machine Learning

Andrew Caplin; Daniel Martin; and Philip Marx

arXiv:2205.04613·cs.LG·August 2, 2022

Calibrating for Class Weights by Modeling Machine Learning

Andrew Caplin, Daniel Martin, and Philip Marx

PDF

Open Access

TL;DR

This paper investigates the conflict between calibration and class weighting in machine learning, proposing a model-based method to recover likelihoods from miscalibrated algorithms, validated on pneumonia detection data.

Contribution

It introduces a novel model-based explanation for calibration issues caused by class weighting and offers a simple method to recover likelihoods from such models.

Findings

01

The proposed method effectively recovers likelihoods in miscalibrated models.

02

Validation on pneumonia detection shows improved calibration accuracy.

03

The approach clarifies the relationship between class weighting and calibration.

Abstract

A much studied issue is the extent to which the confidence scores provided by machine learning algorithms are calibrated to ground truth probabilities. Our starting point is that calibration is seemingly incompatible with class weighting, a technique often employed when one class is less common (class imbalance) or with the hope of achieving some external objective (cost-sensitive learning). We provide a model-based explanation for this incompatibility and use our anthropomorphic model to generate a simple method of recovering likelihoods from an algorithm that is miscalibrated due to class weighting. We validate this approach in the binary pneumonia detection task of Rajpurkar, Irvin, Zhu, et al. (2017).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Anomaly Detection Techniques and Applications · Machine Learning and Data Classification