InterFair: Debiasing with Natural Language Feedback for Fair   Interpretable Predictions

Bodhisattwa Prasad Majumder; Zexue He; Julian McAuley

arXiv:2210.07440·cs.CL·October 24, 2023·1 cites

InterFair: Debiasing with Natural Language Feedback for Fair Interpretable Predictions

Bodhisattwa Prasad Majumder, Zexue He, Julian McAuley

PDF

Open Access 1 Datasets

TL;DR

This paper proposes an interactive approach to debias NLP models using natural language feedback, enabling better bias mitigation and task performance without removing sensitive information entirely.

Contribution

It introduces two user-in-the-loop setups that leverage natural language feedback to achieve fairer and more effective debiasing in NLP models.

Findings

01

Bias in explanations decreased by 5-8% with maintained accuracy.

02

Human feedback disentangled bias from predictive info, improving bias mitigation.

03

Task performance improved by 4-5% through interactive feedback.

Abstract

Debiasing methods in NLP models traditionally focus on isolating information related to a sensitive attribute (e.g., gender or race). We instead argue that a favorable debiasing method should use sensitive information 'fairly,' with explanations, rather than blindly eliminating it. This fair balance is often subjective and can be challenging to achieve algorithmically. We explore two interactive setups with a frozen predictive model and show that users able to provide feedback can achieve a better and fairer balance between task performance and bias mitigation. In one setup, users, by interacting with test examples, further decreased bias in the explanations (5-8%) while maintaining the same prediction accuracy. In the other setup, human feedback was able to disentangle associated bias and predictive information from the input leading to superior bias mitigation and improved task…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

avduarte333/arXivTection
dataset· 761 dl
761 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Topic Modeling