Metric-DST: Mitigating Selection Bias Through Diversity-Guided Semi-Supervised Metric Learning
Yasin I. Tepeli, Mathijs de Wolf, Joana P. Gon\c{c}alves

TL;DR
Metric-DST introduces a diversity-guided semi-supervised learning approach using metric learning to effectively mitigate selection bias and improve fairness in machine learning models across various datasets.
Contribution
It proposes a novel diversity-guided self-training method leveraging metric learning to counteract confidence-based bias reinforcement.
Findings
Learned more robust models under selection bias.
Effective on both generated and real-world biased datasets.
Improved fairness in molecular biology prediction tasks.
Abstract
Selection bias poses a critical challenge for fairness in machine learning, as models trained on data that is less representative of the population might exhibit undesirable behavior for underrepresented profiles. Semi-supervised learning strategies like self-training can mitigate selection bias by incorporating unlabeled data into model training to gain further insight into the distribution of the population. However, conventional self-training seeks to include high-confidence data samples, which may reinforce existing model bias and compromise effectiveness. We propose Metric-DST, a diversity-guided self-training strategy that leverages metric learning and its implicit embedding space to counter confidence-based bias through the inclusion of more diverse samples. Metric-DST learned more robust models in the presence of selection bias for generated and real-world datasets with induced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques
