TL;DR
NLPGuard is a framework designed to reduce the reliance of NLP classifiers on protected attributes, effectively decreasing bias and dependence on sensitive information without sacrificing model accuracy.
Contribution
It introduces a method to modify training data to mitigate reliance on protected attributes in NLP classifiers, addressing a gap in bias mitigation techniques.
Findings
Dependence on protected attributes can be as high as 23% in NLP classifiers.
NLPGuard reduces reliance on protected attributes by up to 79%.
It slightly improves classifier accuracy while reducing bias.
Abstract
AI regulations are expected to prohibit machine learning models from using sensitive attributes during training. However, the latest Natural Language Processing (NLP) classifiers, which rely on deep learning, operate as black-box systems, complicating the detection and remediation of such misuse. Traditional bias mitigation methods in NLP aim for comparable performance across different groups based on attributes like gender or race but fail to address the underlying issue of reliance on protected attributes. To partly fix that, we introduce NLPGuard, a framework for mitigating the reliance on protected attributes in NLP classifiers. NLPGuard takes an unlabeled dataset, an existing NLP classifier, and its training data as input, producing a modified training dataset that significantly reduces dependence on protected attributes without compromising accuracy. NLPGuard is applied to three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
