Measuring Misogyny in Natural Language Generation: Preliminary Results from a Case Study on two Reddit Communities
Aaron J. Snoswell, Lucinda Nelson, Hao Xue, Flora D. Salim, Nicolas, Suzor, Jean Burgess

TL;DR
This paper examines the inadequacy of generic toxicity classifiers for detecting misogyny in language generation, proposing a misogyny-specific lexicon as a more effective benchmark based on a case study of Reddit communities.
Contribution
It introduces a misogyny-specific lexicon as a promising benchmark for evaluating language models, highlighting limitations of generic toxicity classifiers in this context.
Findings
Generic toxicity classifiers fail to distinguish misogyny levels in generated texts.
A misogyny-specific lexicon can reveal differences between communities.
Highlights the need for specialized benchmarks in harm evaluation.
Abstract
Generic `toxicity' classifiers continue to be used for evaluating the potential for harm in natural language generation, despite mounting evidence of their shortcomings. We consider the challenge of measuring misogyny in natural language generation, and argue that generic `toxicity' classifiers are inadequate for this task. We use data from two well-characterised `Incel' communities on Reddit that differ primarily in their degrees of misogyny to construct a pair of training corpora which we use to fine-tune two language models. We show that an open source `toxicity' classifier is unable to distinguish meaningfully between generations from these models. We contrast this with a misogyny-specific lexicon recently proposed by feminist subject-matter experts, demonstrating that, despite the limitations of simple lexicon-based approaches, this shows promise as a benchmark to evaluate language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
