HatePRISM: Policies, Platforms, and Research Integration. Advancing NLP for Hate Speech Proactive Mitigation

Naquee Rizwan; Seid Muhie Yimam; Daryna Dementieva; Florian Skupin; Tim Fischer; Daniil Moskovskiy; Aarushi Ajay Borkar; Robert Geislinger; Punyajoy Saha; Sarthak Roy; Martin Semmann; Alexander Panchenko; Chris Biemann; Animesh Mukherjee

arXiv:2507.04350·cs.CL·July 8, 2025

HatePRISM: Policies, Platforms, and Research Integration. Advancing NLP for Hate Speech Proactive Mitigation

Naquee Rizwan, Seid Muhie Yimam, Daryna Dementieva, Florian Skupin, Tim Fischer, Daniil Moskovskiy, Aarushi Ajay Borkar, Robert Geislinger, Punyajoy Saha, Sarthak Roy, Martin Semmann, Alexander Panchenko, Chris Biemann, Animesh Mukherjee

PDF

1 Video

TL;DR

HatePRISM offers a comprehensive analysis of hate speech regulation policies, platform practices, and NLP research, highlighting inconsistencies and proposing a unified framework for proactive hate speech mitigation.

Contribution

It provides an integrated examination of hate speech policies, platform moderation, and research datasets, and suggests directions for developing a unified automated moderation framework.

Findings

01

Significant inconsistencies in hate speech definitions across jurisdictions.

02

Lack of alignment between platform policies and NLP research datasets.

03

Need for a unified framework for automated hate speech moderation.

Abstract

Despite regulations imposed by nations and social media platforms, e.g. (Government of India, 2021; European Parliament and Council of the European Union, 2022), inter alia, hateful content persists as a significant challenge. Existing approaches primarily rely on reactive measures such as blocking or suspending offensive messages, with emerging strategies focusing on proactive measurements like detoxification and counterspeech. In our work, which we call HatePRISM, we conduct a comprehensive examination of hate speech regulations and strategies from three perspectives: country regulations, social platform policies, and NLP research datasets. Our findings reveal significant inconsistencies in hate speech definitions and moderation practices across jurisdictions and platforms, alongside a lack of alignment with research efforts. Based on these insights, we suggest ideas and research…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

HatePRISM: Policies, Platforms, and Research Integration. Advancing NLP for Hate Speech Proactive Mitigation· underline