Fairness in Semi-supervised Learning: Unlabeled Data Help to Reduce Discrimination
Tao Zhang, Tianqing Zhu, Jing Li, Mengde Han, Wanlei Zhou, and Philip, S. Yu

TL;DR
This paper explores how semi-supervised learning can leverage unlabeled data to reduce discrimination in machine learning models, proposing a new framework that improves fairness without sacrificing accuracy.
Contribution
It introduces a novel fair semi-supervised learning framework combining pseudo-labeling, re-sampling, and ensemble methods to enhance fairness and accuracy.
Findings
Unlabeled data can improve fairness in machine learning models.
The proposed method achieves better fairness-accuracy trade-offs.
Theoretical analysis identifies sources of discrimination in semi-supervised learning.
Abstract
A growing specter in the rise of machine learning is whether the decisions made by machine learning models are fair. While research is already underway to formalize a machine-learning concept of fairness and to design frameworks for building fair models with sacrifice in accuracy, most are geared toward either supervised or unsupervised learning. Yet two observations inspired us to wonder whether semi-supervised learning might be useful to solve discrimination problems. First, previous study showed that increasing the size of the training set may lead to a better trade-off between fairness and accuracy. Second, the most powerful models today require an enormous of data to train which, in practical terms, is likely possible from a combination of labeled and unlabeled data. Hence, in this paper, we present a framework of fair semi-supervised learning in the pre-processing phase, including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
