Generating Gender Augmented Data for NLP

Nishtha Jain; Maja Popovic; Declan Groves; Eva Vanmassenhove

arXiv:2107.05987·cs.CL·July 14, 2021

Generating Gender Augmented Data for NLP

Nishtha Jain, Maja Popovic, Declan Groves, Eva Vanmassenhove

PDF

1 Repo

TL;DR

This paper presents a neural machine translation-based method to automatically generate gender-balanced conversational data in Spanish, addressing gender bias issues in NLP applications, especially for gender-inflected languages.

Contribution

It introduces a novel NMT approach for rewriting sentences to produce gender alternatives, enhancing inclusivity and balancing gender representation in NLP datasets.

Findings

01

Effective automatic generation of gender alternatives in Spanish

02

Promising results in reducing gender bias in conversational NLP

03

Applicable to creating balanced training data for NLP models

Abstract

Gender bias is a frequent occurrence in NLP-based applications, especially pronounced in gender-inflected languages. Bias can appear through associations of certain adjectives and animate nouns with the natural gender of referents, but also due to unbalanced grammatical gender frequencies of inflected words. This type of bias becomes more evident in generating conversational utterances where gender is not specified within the sentence, because most current NLP applications still work on a sentence-level context. As a step towards more inclusive NLP, this paper proposes an automatic and generalisable rewriting approach for short conversational sentences. The rewriting method can be applied to sentences that, without extra-sentential context, have multiple equivalent alternatives in terms of gender. The method can be applied both for creating gender balanced outputs as well as for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

awslabs/sockeye
mxnetOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.