Nonlinear Concept Erasure: a Density Matching Approach

Antoine Saillenfest; Pirmin Lemberger

arXiv:2507.12341·cs.LG·August 19, 2025

Nonlinear Concept Erasure: a Density Matching Approach

Antoine Saillenfest, Pirmin Lemberger

PDF

Open Access

TL;DR

This paper introduces $ar{ ext{L}}$EOPARD, a novel method for nonlinear concept erasure in NLP embeddings that effectively removes sensitive attribute information while preserving semantic content, enhancing fairness.

Contribution

The paper proposes a new orthogonal projection-based approach for nonlinear concept erasure that outperforms existing methods on NLP benchmarks and reduces bias in classifiers.

Findings

01

Achieves state-of-the-art nonlinear attribute erasure performance.

02

Effectively mitigates bias in deep nonlinear classifiers.

03

Preserves semantic structure while removing sensitive information.

Abstract

Ensuring that neural models used in real-world applications cannot infer sensitive information, such as demographic attributes like gender or race, from text representations is a critical challenge when fairness is a concern. We address this issue through concept erasure, a process that removes information related to a specific concept from distributed representations while preserving as much of the remaining semantic information as possible. Our approach involves learning an orthogonal projection in the embedding space, designed to make the class-conditional feature distributions of the discrete concept to erase indistinguishable after projection. By adjusting the rank of the projector, we control the extent of information removal, while its orthogonality ensures strict preservation of the local structure of the embeddings. Our method, termed $\overline{L}$ EOPARD, achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTime Series Analysis and Forecasting · Machine Learning and Data Classification · Data Stream Mining Techniques