Hate Speech Detection Using Cross-Platform Social Media Data In English and German Language
Gautam Kishore Shahi, Tim A. Majchrzak

TL;DR
This study investigates how combining cross-platform social media data in English and German improves hate speech detection models, highlighting the benefits of multi-source datasets for more accurate classification.
Contribution
It demonstrates that integrating datasets from multiple social media platforms enhances hate speech detection accuracy in bilingual contexts.
Findings
Adding similar datasets improves model performance.
Combining YouTube, Twitter, and Gab data yields highest F1-scores.
Cross-platform data integration is effective for hate speech detection.
Abstract
Hate speech has grown into a pervasive phenomenon, intensifying during times of crisis, elections, and social unrest. Multiple approaches have been developed to detect hate speech using artificial intelligence, but a generalized model is yet unaccomplished. The challenge for hate speech detection as text classification is the cost of obtaining high-quality training data. This study focuses on detecting bilingual hate speech in YouTube comments and measuring the impact of using additional data from other platforms in the performance of the classification model. We examine the value of additional training datasets from cross-platforms for improving the performance of classification models. We also included factors such as content similarity, definition similarity, and common hate words to measure the impact of datasets on performance. Our findings show that adding more similar datasets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Freedom of Expression and Defamation
