NLPGuard: A Framework for Mitigating the Use of Protected Attributes by   NLP Classifiers

Salvatore Greco; Ke Zhou; Licia Capra; Tania Cerquitelli; Daniele; Quercia

arXiv:2407.01697·cs.CL·November 19, 2024

NLPGuard: A Framework for Mitigating the Use of Protected Attributes by NLP Classifiers

Salvatore Greco, Ke Zhou, Licia Capra, Tania Cerquitelli, Daniele, Quercia

PDF

1 Repo

TL;DR

NLPGuard is a framework designed to reduce the reliance of NLP classifiers on protected attributes, effectively decreasing bias and dependence on sensitive information without sacrificing model accuracy.

Contribution

It introduces a method to modify training data to mitigate reliance on protected attributes in NLP classifiers, addressing a gap in bias mitigation techniques.

Findings

01

Dependence on protected attributes can be as high as 23% in NLP classifiers.

02

NLPGuard reduces reliance on protected attributes by up to 79%.

03

It slightly improves classifier accuracy while reducing bias.

Abstract

AI regulations are expected to prohibit machine learning models from using sensitive attributes during training. However, the latest Natural Language Processing (NLP) classifiers, which rely on deep learning, operate as black-box systems, complicating the detection and remediation of such misuse. Traditional bias mitigation methods in NLP aim for comparable performance across different groups based on attributes like gender or race but fail to address the underlying issue of reliance on protected attributes. To partly fix that, we introduce NLPGuard, a framework for mitigating the reliance on protected attributes in NLP classifiers. NLPGuard takes an unlabeled dataset, an existing NLP classifier, and its training data as input, producing a modified training dataset that significantly reduces dependence on protected attributes without compromising accuracy. NLPGuard is applied to three…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

grecosalvatore/nlpguard
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.