Mitigating Label Bias via Decoupled Confident Learning
Yunyi Li, Maria De-Arteaga, Maytal Saar-Tsechansky

TL;DR
This paper introduces Decoupled Confident Learning (DeCoLe), a novel method to mitigate label bias in datasets, demonstrated on synthetic data and hate speech detection, outperforming existing approaches.
Contribution
It proposes a new pruning-based method, DeCoLe, specifically designed to identify and reduce biased labels in machine learning datasets.
Findings
DeCoLe effectively identifies biased labels in synthetic datasets.
DeCoLe outperforms competing methods in hate speech detection tasks.
The approach improves fairness by mitigating label bias in real-world applications.
Abstract
Growing concerns regarding algorithmic fairness have led to a surge in methodologies to mitigate algorithmic bias. However, such methodologies largely assume that observed labels in training data are correct. This is problematic because bias in labels is pervasive across important domains, including healthcare, hiring, and content moderation. In particular, human-generated labels are prone to encoding societal biases. While the presence of labeling bias has been discussed conceptually, there is a lack of methodologies to address this problem. We propose a pruning method -- Decoupled Confident Learning (DeCoLe) -- specifically designed to mitigate label bias. After illustrating its performance on a synthetic dataset, we apply DeCoLe in the context of hate speech detection, where label bias has been recognized as an important challenge, and show that it successfully identifies biased…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Adversarial Robustness in Machine Learning
MethodsPruning
