Nonlinear Concept Erasure: a Density Matching Approach
Antoine Saillenfest, Pirmin Lemberger

TL;DR
This paper introduces $ar{ ext{L}}$EOPARD, a novel method for nonlinear concept erasure in NLP embeddings that effectively removes sensitive attribute information while preserving semantic content, enhancing fairness.
Contribution
The paper proposes a new orthogonal projection-based approach for nonlinear concept erasure that outperforms existing methods on NLP benchmarks and reduces bias in classifiers.
Findings
Achieves state-of-the-art nonlinear attribute erasure performance.
Effectively mitigates bias in deep nonlinear classifiers.
Preserves semantic structure while removing sensitive information.
Abstract
Ensuring that neural models used in real-world applications cannot infer sensitive information, such as demographic attributes like gender or race, from text representations is a critical challenge when fairness is a concern. We address this issue through concept erasure, a process that removes information related to a specific concept from distributed representations while preserving as much of the remaining semantic information as possible. Our approach involves learning an orthogonal projection in the embedding space, designed to make the class-conditional feature distributions of the discrete concept to erase indistinguishable after projection. By adjusting the rank of the projector, we control the extent of information removal, while its orthogonality ensures strict preservation of the local structure of the embeddings. Our method, termed EOPARD, achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Machine Learning and Data Classification · Data Stream Mining Techniques
