Controlling Bias Exposure for Fair Interpretable Predictions
Zexue He, Yu Wang, Julian McAuley, Bodhisattwa Prasad Majumder

TL;DR
This paper introduces a novel debiasing algorithm for NLP models that balances bias mitigation and task performance by controlling the use of sensitive information rather than eliminating it entirely.
Contribution
The work proposes a new debiasing method that adjusts the model's reliance on sensitive attributes based on their relevance, providing more nuanced bias control.
Findings
Achieves a better trade-off between debiasing and task accuracy.
Produces debiased rationales as evidence.
Effective on multiple NLP tasks influenced by gender and race.
Abstract
Recent work on reducing bias in NLP models usually focuses on protecting or isolating information related to a sensitive attribute (like gender or race). However, when sensitive information is semantically entangled with the task information of the input, e.g., gender information is predictive for a profession, a fair trade-off between task performance and bias mitigation is difficult to achieve. Existing approaches perform this trade-off by eliminating bias information from the latent space, lacking control over how much bias is necessarily required to be removed. We argue that a favorable debiasing method should use sensitive information 'fairly', rather than blindly eliminating it (Caliskan et al., 2017; Sun et al., 2019; Bogen et al., 2020). In this work, we provide a novel debiasing algorithm by adjusting the predictive model's belief to (1) ignore the sensitive information if it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Topic Modeling
