Robust Quantification of Gender Disparity in Pre-Modern English Literature using Natural Language Processing
Akarsh Nagaraj, Mayank Kejriwal

TL;DR
This study employs natural language processing to quantify gender disparity in pre-modern English literature, revealing significant discrepancies between female and male characters, with variations based on author gender and stable trends over decades.
Contribution
It introduces a robust, transparent methodology using established NLP tools to measure gender disparity in historical literature at scale.
Findings
Female characters are less prevalent than male characters in pre-modern texts.
Discrepancy decreases when the author is female.
Gender disparity remains relatively stable over the examined decades.
Abstract
Research has continued to shed light on the extent and significance of gender disparity in social, cultural and economic spheres. More recently, computational tools from the Natural Language Processing (NLP) literature have been proposed for measuring such disparity using relatively extensive datasets and empirically rigorous methodologies. In this paper, we contribute to this line of research by studying gender disparity, at scale, in copyright-expired literary texts published in the pre-modern period (defined in this work as the period ranging from the mid-nineteenth through the mid-twentieth century). One of the challenges in using such tools is to ensure quality control, and by extension, trustworthy statistical analysis. Another challenge is in using materials and methods that are publicly available and have been established for some time, both to ensure that they can be used and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Authorship Attribution and Profiling
