Shielded Representations: Protecting Sensitive Attributes Through   Iterative Gradient-Based Projection

Shadi Iskander; Kira Radinsky; Yonatan Belinkov

arXiv:2305.10204·cs.CL·May 18, 2023·1 cites

Shielded Representations: Protecting Sensitive Attributes Through Iterative Gradient-Based Projection

Shadi Iskander, Kira Radinsky, Yonatan Belinkov

PDF

Open Access 1 Repo

TL;DR

This paper introduces IGBP, a novel iterative method that effectively removes non-linear social biases from neural representations in NLP models, improving fairness without sacrificing accuracy.

Contribution

IGBP is the first method to remove non-linear encoded concepts from neural representations through iterative classifier training and projection.

Findings

01

Effective bias mitigation for gender and race attributes.

02

Minimal impact on downstream task accuracy.

03

Outperforms existing linear removal methods.

Abstract

Natural language processing models tend to learn and encode social biases present in the data. One popular approach for addressing such biases is to eliminate encoded information from the model's representations. However, current methods are restricted to removing only linearly encoded information. In this work, we propose Iterative Gradient-Based Projection (IGBP), a novel method for removing non-linear encoded concepts from neural representations. Our method consists of iteratively training neural classifiers to predict a particular attribute we seek to eliminate, followed by a projection of the representation on a hypersurface, such that the classifiers become oblivious to the target attribute. We evaluate the effectiveness of our method on the task of removing gender and race information as sensitive attributes. Our results demonstrate that IGBP is effective in mitigating bias…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

technion-cs-nlp/igbp_nonlinear-removal
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling