FairFlow: An Automated Approach to Model-based Counterfactual Data   Augmentation For NLP

Ewoenam Kwaku Tokpo; Toon Calders

arXiv:2407.16431·cs.CL·July 24, 2024

FairFlow: An Automated Approach to Model-based Counterfactual Data Augmentation For NLP

Ewoenam Kwaku Tokpo, Toon Calders

PDF

1 Repo

TL;DR

FairFlow is an automated, model-based method for generating high-quality counterfactual data to reduce societal biases in NLP models, overcoming limitations of previous dictionary-based approaches.

Contribution

It introduces an automated approach to generate parallel data for counterfactual augmentation, reducing reliance on manual data and improving quality over dictionary-based methods.

Findings

01

Outperforms dictionary-based substitution in quality and context relevance.

02

Reduces need for manual parallel data collection.

03

Effectively mitigates societal biases in NLP models.

Abstract

Despite the evolution of language models, they continue to portray harmful societal biases and stereotypes inadvertently learned from training data. These inherent biases often result in detrimental effects in various applications. Counterfactual Data Augmentation (CDA), which seeks to balance demographic attributes in training data, has been a widely adopted approach to mitigate bias in natural language processing. However, many existing CDA approaches rely on word substitution techniques using manually compiled word-pair dictionaries. These techniques often lead to out-of-context substitutions, resulting in potential quality issues. The advancement of model-based techniques, on the other hand, has been challenged by the need for parallel training data. Works in this area resort to manually generated parallel data that are expensive to collect and are consequently limited in scale.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

EwoeT/FairFlow
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.