Linear Adversarial Concept Erasure
Shauli Ravfogel, Michael Twiton, Yoav Goldberg, Ryan Cotterell

TL;DR
This paper introduces a method to identify and erase linear subspaces in neural representations that encode specific concepts, such as gender, to mitigate bias while preserving model performance.
Contribution
It formulates the concept erasure as a constrained linear game, derives a closed-form solution, and proposes a convex relaxation method called extsc{Method} for effective bias mitigation.
Findings
The method effectively removes gender bias in neural representations.
It maintains model accuracy while reducing bias.
The approach is interpretable and computationally tractable.
Abstract
Modern neural models trained on textual data rely on pre-trained representations that emerge without direct supervision. As these representations are increasingly being used in real-world applications, the inability to \emph{control} their content becomes an increasingly important problem. We formulate the problem of identifying and erasing a linear subspace that corresponds to a given concept, in order to prevent linear predictors from recovering the concept. We model this problem as a constrained, linear maximin game, and show that existing solutions are generally not optimal for this task. We derive a closed-form solution for certain objectives, and propose a convex relaxation, \method, that works well for others. When evaluated in the context of binary gender removal, the method recovers a low-dimensional subspace whose removal mitigates bias by intrinsic and extrinsic evaluation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗aieng-lab/bert-base-cased-gradiend-gender-debiasedmodel
- 🤗aieng-lab/bert-large-cased-gradiend-gender-debiasedmodel· 6 dl6 dl
- 🤗aieng-lab/distilbert-base-cased-gradiend-gender-debiasedmodel· 6 dl6 dl
- 🤗aieng-lab/roberta-large-gradiend-gender-debiasedmodel· 4 dl4 dl
- 🤗aieng-lab/gpt2-gradiend-gender-debiasedmodel· 3 dl3 dl
- 🤗aieng-lab/Llama-3.2-3B-gradiend-gender-debiasedmodel· 5 dl5 dl
- 🤗aieng-lab/Llama-3.2-3B-Instruct-gradiend-gender-debiasedmodel· 5 dl5 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Machine Learning in Healthcare
