TL;DR
This paper presents a neural machine translation-based method to automatically generate gender-balanced conversational data in Spanish, addressing gender bias issues in NLP applications, especially for gender-inflected languages.
Contribution
It introduces a novel NMT approach for rewriting sentences to produce gender alternatives, enhancing inclusivity and balancing gender representation in NLP datasets.
Findings
Effective automatic generation of gender alternatives in Spanish
Promising results in reducing gender bias in conversational NLP
Applicable to creating balanced training data for NLP models
Abstract
Gender bias is a frequent occurrence in NLP-based applications, especially pronounced in gender-inflected languages. Bias can appear through associations of certain adjectives and animate nouns with the natural gender of referents, but also due to unbalanced grammatical gender frequencies of inflected words. This type of bias becomes more evident in generating conversational utterances where gender is not specified within the sentence, because most current NLP applications still work on a sentence-level context. As a step towards more inclusive NLP, this paper proposes an automatic and generalisable rewriting approach for short conversational sentences. The rewriting method can be applied to sentences that, without extra-sentential context, have multiple equivalent alternatives in terms of gender. The method can be applied both for creating gender balanced outputs as well as for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
