Detecting Cross-Geographic Biases in Toxicity Modeling on Social Media
Sayan Ghosh, Dylan Baker, David Jurgens, Vinodkumar Prabhakaran

TL;DR
This paper presents a weakly supervised approach to detect and analyze geographic biases in toxicity detection models on social media, highlighting disparities in non-Western contexts and proposing initial mitigation strategies.
Contribution
Introduces a novel weakly supervised method for identifying geographic biases in toxicity models, expanding bias analysis beyond Western-centric datasets.
Findings
Identifies salient cross-geographic error groups in toxicity detection
Shows that bias groupings align with human judgments of offensive language
Provides preliminary insights into bias mitigation strategies
Abstract
Online social media platforms increasingly rely on Natural Language Processing (NLP) techniques to detect abusive content at scale in order to mitigate the harms it causes to their users. However, these techniques suffer from various sampling and association biases present in training data, often resulting in sub-par performance on content relevant to marginalized groups, potentially furthering disproportionate harms towards them. Studies on such biases so far have focused on only a handful of axes of disparities and subgroups that have annotations/lexicons available. Consequently, biases concerning non-Western contexts are largely ignored in the literature. In this paper, we introduce a weakly supervised method to robustly detect lexical biases in broader geocultural contexts. Through a case study on a publicly available toxicity detection model, we demonstrate that our method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Terrorism, Counterterrorism, and Political Violence
