Gender Inequality in English Textbooks Around the World: an NLP Approach
Tairan Liu

TL;DR
This study uses NLP techniques to quantify and compare gender inequality in English textbooks from 22 countries, revealing consistent male overrepresentation across diverse cultural contexts.
Contribution
It introduces a cross-cultural NLP framework to measure gender bias in textbooks, combining multiple metrics and analyzing large language models' ability to detect gendered language.
Findings
Male characters are overrepresented in count, firstness, and named entities.
Gender inequality exists in all regions studied, with Latin sphere showing least disparity.
NLP methods can effectively quantify and analyze gender bias in educational texts.
Abstract
Textbooks play a critical role in shaping children's understanding of the world. While previous studies have identified gender inequality in individual countries' textbooks, few have examined the issue cross-culturally. This study applies natural language processing methods to quantify gender inequality in English textbooks from 22 countries across 7 cultural spheres. Metrics include character count, firstness (which gender is mentioned first), and TF-IDF word associations by gender. The analysis also identifies gender patterns in proper names appearing in TF-IDF word lists, tests whether large language models can distinguish between gendered word lists, and uses GloVe embeddings to examine how closely keywords associate with each gender. Results show consistent overrepresentation of male characters in terms of count, firstness, and named entities. All regions exhibit gender inequality,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGender Studies in Language
MethodsGloVe Embeddings
