ModelCitizens: Representing Community Voices in Online Safety

Ashima Suvarna; Christina Chance; Karolina Naranjo; Hamid Palangi; Sophie Hao; Thomas Hartvigsen; Saadia Gabriel

arXiv:2507.05455·cs.CL·July 10, 2025

ModelCitizens: Representing Community Voices in Online Safety

Ashima Suvarna, Christina Chance, Karolina Naranjo, Hamid Palangi, Sophie Hao, Thomas Hartvigsen, Saadia Gabriel

PDF

Open Access 2 Models 1 Datasets 1 Video

TL;DR

This paper introduces MODELCITIZENS, a community-informed dataset for toxicity detection on social media, highlighting the importance of diverse perspectives and context in improving moderation tools.

Contribution

It presents a new dataset with community-specific annotations, augmented conversational context, and fine-tuned models that outperform existing toxicity detection tools.

Findings

01

State-of-the-art tools underperform on community-informed data

02

Context augmentation degrades detection accuracy

03

Fine-tuned models outperform GPT-o4-mini by 5.5%

Abstract

Automatic toxic language detection is critical for creating safe, inclusive online spaces. However, it is a highly subjective task, with perceptions of toxic language shaped by community norms and lived experience. Existing toxicity detection models are typically trained on annotations that collapse diverse annotator perspectives into a single ground truth, erasing important context-specific notions of toxicity such as reclaimed language. To address this, we introduce MODELCITIZENS, a dataset of 6.8K social media posts and 40K toxicity annotations across diverse identity groups. To capture the role of conversational context on toxicity, typical of social media posts, we augment MODELCITIZENS posts with LLM-generated conversational scenarios. State-of-the-art toxicity detection tools (e.g. OpenAI Moderation API, GPT-o4-mini) underperform on MODELCITIZENS, with further degradation on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

modelcitizens/modelcitizens
dataset· 93 dl
93 dl

Videos

ModelCitizens: Representing Community Voices in Online Safety· underline

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Topic Modeling · Misinformation and Its Impacts