Metric-DST: Mitigating Selection Bias Through Diversity-Guided   Semi-Supervised Metric Learning

Yasin I. Tepeli; Mathijs de Wolf; Joana P. Gon\c{c}alves

arXiv:2411.18442·cs.LG·December 2, 2024

Metric-DST: Mitigating Selection Bias Through Diversity-Guided Semi-Supervised Metric Learning

Yasin I. Tepeli, Mathijs de Wolf, Joana P. Gon\c{c}alves

PDF

Open Access 1 Repo

TL;DR

Metric-DST introduces a diversity-guided semi-supervised learning approach using metric learning to effectively mitigate selection bias and improve fairness in machine learning models across various datasets.

Contribution

It proposes a novel diversity-guided self-training method leveraging metric learning to counteract confidence-based bias reinforcement.

Findings

01

Learned more robust models under selection bias.

02

Effective on both generated and real-world biased datasets.

03

Improved fairness in molecular biology prediction tasks.

Abstract

Selection bias poses a critical challenge for fairness in machine learning, as models trained on data that is less representative of the population might exhibit undesirable behavior for underrepresented profiles. Semi-supervised learning strategies like self-training can mitigate selection bias by incorporating unlabeled data into model training to gain further insight into the distribution of the population. However, conventional self-training seeks to include high-confidence data samples, which may reinforce existing model bias and compromise effectiveness. We propose Metric-DST, a diversity-guided self-training strategy that leverages metric learning and its implicit embedding space to counter confidence-based bias through the inclusion of more diverse samples. Metric-DST learned more robust models in the presence of selection bias for generated and real-world datasets with induced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

joanagoncalveslab/metric-dst
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques