TL;DR
Fair-SSL introduces a semi-supervised framework that effectively mitigates ethical bias in machine learning models using only 10% labeled data, balancing fairness and performance across multiple datasets.
Contribution
This work is the first to apply semi-supervised learning techniques for ethical bias mitigation in software engineering machine learning models, reducing the need for extensive labeled data.
Findings
Achieves similar fairness performance as state-of-the-art methods
Requires only 10% of labeled data for bias mitigation
Effective across multiple datasets and learners
Abstract
Ethical bias in machine learning models has become a matter of concern in the software engineering community. Most of the prior software engineering works concentrated on finding ethical bias in models rather than fixing it. After finding bias, the next step is mitigation. Prior researchers mainly tried to use supervised approaches to achieve fairness. However, in the real world, getting data with trustworthy ground truth is challenging and also ground truth can contain human bias. Semi-supervised learning is a machine learning technique where, incrementally, labeled data is used to generate pseudo-labels for the rest of the data (and then all that data is used for model training). In this work, we apply four popular semi-supervised techniques as pseudo-labelers to create fair classification models. Our framework, Fair-SSL, takes a very small amount (10%) of labeled data as input and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
