Dataset Distribution Impacts Model Fairness: Single vs. Multi-Task Learning
Ralf Raumanns, Gerard Schouten, Josien P. W. Pluim, Veronika, Cheplygina

TL;DR
This study investigates how dataset composition affects fairness in skin lesion classification models, revealing that dataset diversity and learning strategies significantly influence bias and performance across patient groups.
Contribution
The paper introduces a linear programming method to generate biased datasets and compares different learning strategies, highlighting how dataset diversity impacts model fairness.
Findings
Sex-specific training data improves subgroup performance.
Single-task models exhibit sex bias.
Adversarial learning can eliminate sex bias in certain cases.
Abstract
The influence of bias in datasets on the fairness of model predictions is a topic of ongoing research in various fields. We evaluate the performance of skin lesion classification using ResNet-based CNNs, focusing on patient sex variations in training data and three different learning strategies. We present a linear programming method for generating datasets with varying patient sex and class labels, taking into account the correlations between these variables. We evaluated the model performance using three different learning strategies: a single-task model, a reinforcing multi-task model, and an adversarial learning scheme. Our observations include: 1) sex-specific training data yields better results, 2) single-task models exhibit sex bias, 3) the reinforcement approach does not remove sex bias, 4) the adversarial model eliminates sex bias in cases involving only female patients, and 5)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Data Stream Mining Techniques · Big Data and Business Intelligence
