Preserving Task-Relevant Information Under Linear Concept Removal

Floris Holstege; Shauli Ravfogel; Bram Wouters

arXiv:2506.10703·cs.LG·November 17, 2025

Preserving Task-Relevant Information Under Linear Concept Removal

Floris Holstege, Shauli Ravfogel, Bram Wouters

PDF

Open Access

TL;DR

This paper introduces SPLINCE, a novel method for removing unwanted concepts from neural network representations while preserving their covariance with target labels, improving fairness and interpretability.

Contribution

SPLINCE is the first technique to exactly preserve label covariance while removing linear concept predictability, with theoretical guarantees and empirical superiority.

Findings

01

Outperforms baselines on Bias in Bios and Winobias benchmarks

02

Effectively removes protected attributes with minimal main-task information loss

03

Provides a unique, theoretically justified solution for concept removal

Abstract

Modern neural networks often encode unwanted concepts alongside task-relevant information, leading to fairness and interpretability concerns. Existing post-hoc approaches can remove undesired concepts but often degrade useful signals. We introduce SPLINCE-Simultaneous Projection for LINear concept removal and Covariance prEservation - which eliminates sensitive concepts from representations while exactly preserving their covariance with a target label. SPLINCE achieves this via an oblique projection that 'splices out' the unwanted direction yet protects important label correlations. Theoretically, it is the unique solution that removes linear concept predictability and maintains target covariance with minimal embedding distortion. Empirically, SPLINCE outperforms baselines on benchmarks such as Bias in Bios and Winobias, removing protected attributes while minimally damaging main-task…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning