Fairness via Representation Neutralization
Mengnan Du, Subhabrata Mukherjee, Guanchu Wang, Ruixiang Tang, Ahmed, Hassan Awadallah, Xia Hu

TL;DR
This paper introduces Representation Neutralization for Fairness (RNF), a method that reduces bias in deep neural networks by debiasing only the classification head, avoiding the need for extensive sensitive attribute annotations.
Contribution
RNF is a novel approach that neutralizes sensitive information in the classification head, enabling bias mitigation without modifying the encoder or requiring sensitive attribute labels.
Findings
Effectively reduces discrimination in DNN models.
Maintains high task-specific performance.
Works in low-resource settings with proxy annotations.
Abstract
Existing bias mitigation methods for DNN models primarily work on learning debiased encoders. This process not only requires a lot of instance-level annotations for sensitive attributes, it also does not guarantee that all fairness sensitive information has been removed from the encoder. To address these limitations, we explore the following research question: Can we reduce the discrimination of DNN models by only debiasing the classification head, even with biased representations as inputs? To this end, we propose a new mitigation technique, namely, Representation Neutralization for Fairness (RNF) that achieves fairness by debiasing only the task-specific classification head of DNN models. To this end, we leverage samples with the same ground-truth label but different sensitive attributes, and use their neutralized representations to train the classification head of the DNN model. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsEthics and Social Impacts of AI · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
