Detecting and Correcting for Label Shift with Black Box Predictors

Zachary C. Lipton; Yu-Xiang Wang; Alex Smola

arXiv:1802.03916·cs.LG·July 27, 2018·110 cites

Detecting and Correcting for Label Shift with Black Box Predictors

Zachary C. Lipton, Yu-Xiang Wang, Alex Smola

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces BBSE, a method for detecting and correcting label shift in classifiers using black box predictors, effective even with biased or inaccurate predictors, with proven consistency and practical success.

Contribution

The paper proposes BBSE, a novel approach for estimating and correcting label shift using black box predictors, with theoretical guarantees and applicability to high-dimensional data.

Findings

01

BBSE accurately estimates label distribution shifts.

02

BBSE improves classifier performance under label shift.

03

Method is effective on high-dimensional image datasets.

Abstract

Faced with distribution shift between training and test set, we wish to detect and quantify the shift, and to correct our classifiers without test set labels. Motivated by medical diagnosis, where diseases (targets) cause symptoms (observations), we focus on label shift, where the label marginal $p (y)$ changes but the conditional $p (x ∣ y)$ does not. We propose Black Box Shift Estimation (BBSE) to estimate the test distribution $p (y)$ . BBSE exploits arbitrary black box predictors to reduce dimensionality prior to shift correction. While better predictors give tighter estimates, BBSE works even when predictors are biased, inaccurate, or uncalibrated, so long as their confusion matrices are invertible. We prove BBSE's consistency, bound its error, and introduce a statistical test that uses BBSE to detect shift. We also leverage BBSE to correct classifiers. Experiments demonstrate accurate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Zack Chase Lipton — The Medical Machine Learning Landscape· youtube

Taxonomy

TopicsImage Retrieval and Classification Techniques · Machine Learning and Data Classification · Face and Expression Recognition