Hate Speech Detection Using Cross-Platform Social Media Data In English   and German Language

Gautam Kishore Shahi; Tim A. Majchrzak

arXiv:2410.05287·cs.CL·October 10, 2024

Hate Speech Detection Using Cross-Platform Social Media Data In English and German Language

Gautam Kishore Shahi, Tim A. Majchrzak

PDF

Open Access 1 Repo

TL;DR

This study investigates how combining cross-platform social media data in English and German improves hate speech detection models, highlighting the benefits of multi-source datasets for more accurate classification.

Contribution

It demonstrates that integrating datasets from multiple social media platforms enhances hate speech detection accuracy in bilingual contexts.

Findings

01

Adding similar datasets improves model performance.

02

Combining YouTube, Twitter, and Gab data yields highest F1-scores.

03

Cross-platform data integration is effective for hate speech detection.

Abstract

Hate speech has grown into a pervasive phenomenon, intensifying during times of crisis, elections, and social unrest. Multiple approaches have been developed to detect hate speech using artificial intelligence, but a generalized model is yet unaccomplished. The challenge for hate speech detection as text classification is the cost of obtaining high-quality training data. This study focuses on detecting bilingual hate speech in YouTube comments and measuring the impact of using additional data from other platforms in the performance of the classification model. We examine the value of additional training datasets from cross-platforms for improving the performance of classification models. We also included factors such as content similarity, definition similarity, and common hate words to measure the impact of datasets on performance. Our findings show that adding more similar datasets…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Gautamshahi/BilingualYouTubeHateSpeech
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Freedom of Expression and Defamation