TL;DR
This paper introduces a scalable, dictionary-based method for automatically detecting lexical gender in large language datasets, enabling dynamic and high-coverage analysis of gender bias.
Contribution
The authors propose a novel automated approach for lexical gender detection that overcomes limitations of manual lexicon compilation, providing up-to-date and comprehensive gender identification.
Findings
Achieves over 80% accuracy in lexical gender detection
Effective on Wikipedia samples and previous research word lists
Addresses static and subjective limitations of manual lexicons
Abstract
This paper presents a new method for automatically detecting words with lexical gender in large-scale language datasets. Currently, the evaluation of gender bias in natural language processing relies on manually compiled lexicons of gendered expressions, such as pronouns ('he', 'she', etc.) and nouns with lexical gender ('mother', 'boyfriend', 'policewoman', etc.). However, manual compilation of such lists can lead to static information if they are not periodically updated and often involve value judgments by individual annotators and researchers. Moreover, terms not included in the list fall out of the range of analysis. To address these issues, we devised a scalable, dictionary-based method to automatically detect lexical gender that can provide a dynamic, up-to-date analysis with high coverage. Our approach reaches over 80% accuracy in determining the lexical gender of nouns…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
