ETHOS: an Online Hate Speech Detection Dataset
Ioannis Mollas, Zoe Chrysopoulou, Stamatis Karlos, Grigorios Tsoumakas

TL;DR
ETHOS is a new, validated dataset of YouTube and Reddit comments designed for hate speech detection, featuring binary and multi-label annotations created through an active sampling annotation protocol.
Contribution
This paper introduces ETHOS, a novel hate speech dataset with a unique annotation protocol, validated via crowdsourcing, to improve detection systems in social media contexts.
Findings
ETHOS dataset includes YouTube and Reddit comments.
Validated annotations through crowdsourcing.
Active sampling ensures balanced and accurate labels.
Abstract
Online hate speech is a recent problem in our society that is rising at a steady pace by leveraging the vulnerabilities of the corresponding regimes that characterise most social media platforms. This phenomenon is primarily fostered by offensive comments, either during user interaction or in the form of a posted multimedia context. Nowadays, giant corporations own platforms where millions of users log in every day, and protection from exposure to similar phenomena appears to be necessary in order to comply with the corresponding legislation and maintain a high level of service quality. A robust and reliable system for detecting and preventing the uploading of relevant content will have a significant impact on our digitally interconnected society. Several aspects of our daily lives are undeniably linked to our social profiles, making us vulnerable to abusive behaviours. As a result, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Spam and Phishing Detection · Internet Traffic Analysis and Secure E-voting
Methods1x1 Convolution
