Automatically Inferring Gender Associations from Language
Serina Chang, Kathleen McKeown

TL;DR
This paper introduces a method to automatically infer gender associations from language, revealing domain-dependent differences in how women and men are discussed, with strong performance over baselines.
Contribution
It presents two new datasets and a novel approach for identifying and labeling gender-related semantic clusters in language data.
Findings
Large-scale gendered language differences across domains
Method outperforms baseline models in human evaluations
Differences vary between celebrity news and academic reviews
Abstract
In this paper, we pose the question: do people talk about women and men in different ways? We introduce two datasets and a novel integration of approaches for automatically inferring gender associations from language, discovering coherent word clusters, and labeling the clusters for the semantic concepts they represent. The datasets allow us to compare how people write about women and men in two different settings - one set draws from celebrity news and the other from student reviews of computer science professors. We demonstrate that there are large-scale differences in the ways that people talk about women and men and that these differences vary across domains. Human evaluations show that our methods significantly outperform strong baselines.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Topic Modeling · Authorship Attribution and Profiling
