From Dogwhistles to Bullhorns: Unveiling Coded Rhetoric with Language Models
Julia Mendelsohn, Ronan Le Bras, Yejin Choi, Maarten Sap

TL;DR
This paper presents a large-scale study of dogwhistles, developing a typology and glossary, analyzing their use in political speeches, and evaluating GPT-3's ability to detect them, revealing risks of harmful content evading moderation.
Contribution
It introduces the first comprehensive computational analysis of dogwhistles, including a detailed typology, extensive glossary, and evaluation of language models' detection capabilities.
Findings
GPT-3's detection performance varies across dogwhistle types.
Harmful content with dogwhistles often evades toxicity detection.
The study provides resources for future NLP and social science research.
Abstract
Dogwhistles are coded expressions that simultaneously convey one meaning to a broad audience and a second one, often hateful or provocative, to a narrow in-group; they are deployed to evade both political repercussions and algorithmic content moderation. For example, in the sentence 'we need to end the cosmopolitan experiment,' the word 'cosmopolitan' likely means 'worldly' to many, but secretly means 'Jewish' to a select few. We present the first large-scale computational investigation of dogwhistles. We develop a typology of dogwhistles, curate the largest-to-date glossary of over 300 dogwhistles with rich contextual information and examples, and analyze their usage in historical U.S. politicians' speeches. We then assess whether a large language model (GPT-3) can identify dogwhistles and their meanings, and find that GPT-3's performance varies widely across types of dogwhistles and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
