Leveraging an Alignment Set in Tackling Instance-Dependent Label Noise
Donna Tjandra, Jenna Wiens

TL;DR
This paper introduces a two-stage method leveraging a small set of known labels to improve model accuracy and reduce bias in datasets with instance-dependent label noise, especially in healthcare applications.
Contribution
The paper proposes a novel two-stage approach using anchor points to effectively handle instance-dependent label noise, improving discriminative performance and fairness.
Findings
Improves AUROC from 0.81 to 0.84 on MIMIC-III dataset
Reduces bias as measured by AUEOC
Outperforms state-of-the-art methods in noisy label settings
Abstract
Noisy training labels can hurt model performance. Most approaches that aim to address label noise assume label noise is independent from the input features. In practice, however, label noise is often feature or \textit{instance-dependent}, and therefore biased (i.e., some instances are more likely to be mislabeled than others). E.g., in clinical care, female patients are more likely to be under-diagnosed for cardiovascular disease compared to male patients. Approaches that ignore this dependence can produce models with poor discriminative performance, and in many healthcare settings, can exacerbate issues around health disparities. In light of these limitations, we propose a two-stage approach to learn in the presence instance-dependent label noise. Our approach utilizes \textit{\anchor points}, a small subset of data for which we know the observed and ground truth labels. On several…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning in Healthcare · Human Pose and Action Recognition
