Shirtless and Dangerous: Quantifying Linguistic Signals of Gender Bias in an Online Fiction Writing Community
Ethan Fast, Tina Vachovsky, Michael S. Bernstein

TL;DR
This study uses NLP and crowdsourced data to analyze gender bias in 1.8 billion words of Wattpad fiction, revealing prevalent stereotypes and their correlation with story ratings across genres.
Contribution
Introduces a novel NLP technique combined with crowdsourced stereotypes to quantify gender bias in large-scale online fiction datasets.
Findings
Male over-representation and traditional stereotypes are widespread.
Certain stereotypes, like sexual or violent men, correlate with higher ratings.
Female authors are equally likely to write stereotypes as male authors.
Abstract
Imagine a princess asleep in a castle, waiting for her prince to slay the dragon and rescue her. Tales like the famous Sleeping Beauty clearly divide up gender roles. But what about more modern stories, borne of a generation increasingly aware of social constructs like sexism and racism? Do these stories tend to reinforce gender stereotypes, or counter them? In this paper, we present a technique that combines natural language processing with a crowdsourced lexicon of stereotypes to capture gender biases in fiction. We apply this technique across 1.8 billion words of fiction from the Wattpad online writing community, investigating gender representation in stories, how male and female characters behave and are described, and how authors' use of gender stereotypes is associated with the community's ratings. We find that male over-representation and traditional gender stereotypes (e.g.,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGender Studies in Language · Authorship Attribution and Profiling · Narrative Theory and Analysis
