A Weakly Supervised Classifier and Dataset of White Supremacist Language
Michael Miller Yoder, Ahmad Diab, David West Brown, Kathleen M. Carley

TL;DR
This paper introduces a dataset and a weakly supervised classifier designed to detect white supremacist language online, leveraging domain-specific data and anti-racist texts to improve accuracy and reduce bias.
Contribution
It presents a novel weakly supervised approach and dataset for identifying white supremacist language, enhancing generalization and bias mitigation in hate speech detection.
Findings
Improved cross-domain detection accuracy
Effective bias reduction through anti-racist counterexamples
Demonstrated robustness of the classifier across datasets
Abstract
We present a dataset and classifier for detecting the language of white supremacist extremism, a growing issue in online hate speech. Our weakly supervised classifier is trained on large datasets of text from explicitly white supremacist domains paired with neutral and anti-racist data from similar domains. We demonstrate that this approach improves generalization performance to new domains. Incorporating anti-racist texts as counterexamples to white supremacist language mitigates bias.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
