The Constant in HATE: Analyzing Toxicity in Reddit across Topics and   Languages

Wondimagegnhue Tsegaye Tufa; Ilia Markov; Piek Vossen

arXiv:2404.18726·cs.CL·April 30, 2024·1 cites

The Constant in HATE: Analyzing Toxicity in Reddit across Topics and Languages

Wondimagegnhue Tsegaye Tufa, Ilia Markov, Piek Vossen

PDF

Open Access 1 Repo

TL;DR

This study analyzes toxicity in Reddit comments across multiple languages and topics, revealing patterns of increased toxicity related to specific subjects and notable variations among language communities.

Contribution

It provides a comprehensive cross-lingual, cross-topic analysis of toxicity patterns on Reddit using a large multilingual dataset.

Findings

01

Toxicity spikes vary by topic and language

02

Certain topics consistently show higher toxicity levels

03

Significant within-language community variations observed

Abstract

Toxic language remains an ongoing challenge on social media platforms, presenting significant issues for users and communities. This paper provides a cross-topic and cross-lingual analysis of toxicity in Reddit conversations. We collect 1.5 million comment threads from 481 communities in six languages: English, German, Spanish, Turkish,Arabic, and Dutch, covering 80 topics such as Culture, Politics, and News. We thoroughly analyze how toxicity spikes within different communities in relation to specific topics. We observe consistent patterns of increased toxicity across languages for certain topics, while also noting significant variations within specific language communities.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cltl/reddit_topic
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Hate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining