Reducing Gender Bias in Machine Translation through Counterfactual Data Generation
Ranjita Naik, Spencer Rarrick, Vishal Chowdhary

TL;DR
This paper introduces a simple yet effective method to reduce gender bias in neural machine translation by augmenting training data with counterfactual examples, improving gender accuracy across multiple languages.
Contribution
It proposes a novel domain-adaptation approach using counterfactual data generation to mitigate gender bias without sacrificing translation quality.
Findings
Significant reduction in gender bias on WinoMT test set
Effective across English to French, Spanish, and Italian translations
Maintains translation quality while reducing bias
Abstract
Recent advances in neural methods have led to substantial improvement in the quality of Neural Machine Translation (NMT) systems. However, these systems frequently produce translations with inaccurate gender (Stanovsky et al., 2019), which can be traced to bias in training data. Saunders and Byrne (2020) tackle this problem with a handcrafted dataset containing balanced gendered profession words. By using this data to fine-tune an existing NMT model, they show that gender bias can be significantly mitigated, albeit at the expense of translation quality due to catastrophic forgetting. They recover some of the lost quality with modified training objectives or additional models at inference. We find, however, that simply supplementing the handcrafted dataset with a random sample from the base model training corpus is enough to significantly reduce the catastrophic forgetting. We also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsSparse Evolutionary Training · Balanced Selection
