Multilingual Content Moderation: A Case Study on Reddit

Meng Ye; Karan Sikka; Katherine Atwell; Sabit Hassan; Ajay Divakaran,; Malihe Alikhani

arXiv:2302.09618·cs.CL·February 21, 2023

Multilingual Content Moderation: A Case Study on Reddit

Meng Ye, Karan Sikka, Katherine Atwell, Sabit Hassan, Ajay Divakaran,, Malihe Alikhani

PDF

Open Access 1 Repo

TL;DR

This paper introduces a large multilingual Reddit dataset to analyze content moderation challenges, emphasizing the need for adaptive, rule-based AI moderation across languages and communities, and exploring related research problems.

Contribution

It provides a novel multilingual dataset of Reddit comments and analyzes key challenges in rule-based content moderation across diverse languages and communities.

Findings

01

Highlighting the complexity of rule-based moderation across languages

02

Identifying challenges in cross-lingual transfer and label noise

03

Proposing research directions for improved AI moderation

Abstract

Content moderation is the process of flagging content based on pre-defined platform rules. There has been a growing need for AI moderators to safeguard users as well as protect the mental health of human moderators from traumatic content. While prior works have focused on identifying hateful/offensive language, they are not adequate for meeting the challenges of content moderation since 1) moderation decisions are based on violation of rules, which subsumes detection of offensive speech, and 2) such rules often differ across communities which entails an adaptive solution. We propose to study the challenges of content moderation by introducing a multilingual dataset of 1.8 Million Reddit comments spanning 56 subreddits in English, German, Spanish and French. We perform extensive experimental analysis to highlight the underlying challenges and suggest related research problems such as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mye1225/multilingual_content_mod
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection