Web(er) of Hate: A Survey on How Hate Speech Is Typed
Luna Wang, Andrew Caines, Alice Hutchings

TL;DR
This survey critically examines the methodological choices in hate speech datasets, emphasizing the importance of reflexivity and transparency to improve dataset reliability and validity.
Contribution
It introduces a reflexive approach to dataset creation, encouraging researchers to acknowledge their value judgments and improve methodological rigour in hate speech dataset curation.
Findings
Common design themes in hate speech datasets identified
Implications of methodological choices for dataset reliability discussed
Advocates for reflexive and transparent dataset creation processes
Abstract
The curation of hate speech datasets involves complex design decisions that balance competing priorities. This paper critically examines these methodological choices in a diverse range of datasets, highlighting common themes and practices, and their implications for dataset reliability. Drawing on Max Weber's notion of ideal types, we argue for a reflexive approach in dataset creation, urging researchers to acknowledge their own value judgments during dataset construction, fostering transparency and methodological rigour.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Populism, Right-Wing Movements · Computational and Text Analysis Methods
