Learning with Selectively Labeled Data from Multiple Decision-makers

Jian Chen; Zhehao Li; Xiaojie Mao

arXiv:2306.07566·stat.ML·May 28, 2025·1 cites

Learning with Selectively Labeled Data from Multiple Decision-makers

Jian Chen, Zhehao Li, Xiaojie Mao

PDF

Open Access 1 Video

TL;DR

This paper addresses classification with biased, selectively labeled data from multiple decision-makers, proposing an IV-based framework and a cost-sensitive learning method to achieve robust classification despite identification challenges.

Contribution

It introduces a novel IV framework for identifying classification risk and a unified cost-sensitive learning approach to handle selection bias.

Findings

01

Exact identification of classification risk under certain conditions.

02

Tight partial identification bounds when exact identification isn't possible.

03

Empirical validation demonstrating the effectiveness of the proposed method.

Abstract

We study the problem of classification with selectively labeled data, whose distribution may differ from the full population due to historical decision-making. We exploit the fact that in many applications historical decisions were made by multiple decision-makers, each with different decision rules. We analyze this setup under a principled instrumental variable (IV) framework and rigorously study the identification of classification risk. We establish conditions for the exact identification of classification risk and derive tight partial identification bounds when exact identification fails. We further propose a unified cost-sensitive learning (UCL) approach to learn classifiers robust to selection bias in both identification settings. Finally, we theoretically and numerically validate the efficacy of our proposed method.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Learning with Selectively Labeled Data from Multiple Decision-makers· slideslive

Taxonomy

TopicsWater resources management and optimization · Machine Learning and Data Classification · Forecasting Techniques and Applications