Linear Adversarial Concept Erasure

Shauli Ravfogel; Michael Twiton; Yoav Goldberg; Ryan Cotterell

arXiv:2201.12091·cs.LG·December 18, 2024·23 cites

Linear Adversarial Concept Erasure

Shauli Ravfogel, Michael Twiton, Yoav Goldberg, Ryan Cotterell

PDF

Open Access 2 Repos 7 Models

TL;DR

This paper introduces a method to identify and erase linear subspaces in neural representations that encode specific concepts, such as gender, to mitigate bias while preserving model performance.

Contribution

It formulates the concept erasure as a constrained linear game, derives a closed-form solution, and proposes a convex relaxation method called extsc{Method} for effective bias mitigation.

Findings

01

The method effectively removes gender bias in neural representations.

02

It maintains model accuracy while reducing bias.

03

The approach is interpretable and computationally tractable.

Abstract

Modern neural models trained on textual data rely on pre-trained representations that emerge without direct supervision. As these representations are increasingly being used in real-world applications, the inability to \emph{control} their content becomes an increasingly important problem. We formulate the problem of identifying and erasing a linear subspace that corresponds to a given concept, in order to prevent linear predictors from recovering the concept. We model this problem as a constrained, linear maximin game, and show that existing solutions are generally not optimal for this task. We derive a closed-form solution for certain objectives, and propose a convex relaxation, \method, that works well for others. When evaluated in the context of binary gender removal, the method recovers a low-dimensional subspace whose removal mitigates bias by intrinsic and extrinsic evaluation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Machine Learning in Healthcare