Controlling Bias Exposure for Fair Interpretable Predictions

Zexue He; Yu Wang; Julian McAuley; Bodhisattwa Prasad Majumder

arXiv:2210.07455·cs.CL·October 25, 2022·1 cites

Controlling Bias Exposure for Fair Interpretable Predictions

Zexue He, Yu Wang, Julian McAuley, Bodhisattwa Prasad Majumder

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel debiasing algorithm for NLP models that balances bias mitigation and task performance by controlling the use of sensitive information rather than eliminating it entirely.

Contribution

The work proposes a new debiasing method that adjusts the model's reliance on sensitive attributes based on their relevance, providing more nuanced bias control.

Findings

01

Achieves a better trade-off between debiasing and task accuracy.

02

Produces debiased rationales as evidence.

03

Effective on multiple NLP tasks influenced by gender and race.

Abstract

Recent work on reducing bias in NLP models usually focuses on protecting or isolating information related to a sensitive attribute (like gender or race). However, when sensitive information is semantically entangled with the task information of the input, e.g., gender information is predictive for a profession, a fair trade-off between task performance and bias mitigation is difficult to achieve. Existing approaches perform this trade-off by eliminating bias information from the latent space, lacking control over how much bias is necessarily required to be removed. We argue that a favorable debiasing method should use sensitive information 'fairly', rather than blindly eliminating it (Caliskan et al., 2017; Sun et al., 2019; Bogen et al., 2020). In this work, we provide a novel debiasing algorithm by adjusting the predictive model's belief to (1) ignore the sensitive information if it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zexuehe/interpretable_debiasing
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Topic Modeling