Using Self-supervised Learning Can Improve Model Fairness
Sofia Yfantidou, Dimitris Spathis, Marios Constantinides, Athena, Vakali, Daniele Quercia, Fahim Kawsar

TL;DR
This paper investigates how self-supervised learning (SSL) can enhance model fairness across demographic groups, showing that SSL can significantly improve fairness with minimal performance loss compared to supervised methods.
Contribution
The study introduces a comprehensive fairness assessment framework for SSL and demonstrates that SSL models can achieve up to 30% greater fairness across real-world datasets.
Findings
SSL improves fairness up to 30% with minimal performance loss
Representation dissimilarities correlate with demographic performance gaps
SSL models maintain comparable performance to supervised models
Abstract
Self-supervised learning (SSL) has become the de facto training paradigm of large models, where pre-training is followed by supervised fine-tuning using domain-specific data and labels. Despite demonstrating comparable performance with supervised methods, comprehensive efforts to assess SSL's impact on machine learning fairness (i.e., performing equally on different demographic breakdowns) are lacking. Hypothesizing that SSL models would learn more generic, hence less biased representations, this study explores the impact of pre-training and fine-tuning strategies on fairness. We introduce a fairness assessment framework for SSL, comprising five stages: defining dataset requirements, pre-training, fine-tuning with gradual unfreezing, assessing representation similarity conditioned on demographics, and establishing domain-specific evaluation processes. We evaluate our method's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Process Automation Applications · Machine Learning and Data Classification
