TL;DR
This study combines hate speech annotation, deep learning classification, and retweet network analysis to identify key sources and trends of hate speech on Slovenian Twitter over three years.
Contribution
It introduces a comprehensive approach integrating classification models and community detection to identify main sources of hate speech and analyze their evolution.
Findings
Hate speech is mainly offensive and politically charged.
Unacceptable tweets increased from 20% to 30% over three years.
Most unacceptable tweets originate from anonymous or suspended accounts.
Abstract
We address a challenging problem of identifying main sources of hate speech on Twitter. On one hand, we carefully annotate a large set of tweets for hate speech, and deploy advanced deep learning to produce high quality hate speech classification models. On the other hand, we create retweet networks, detect communities and monitor their evolution through time. This combined approach is applied to three years of Slovenian Twitter data. We report a number of interesting results. Hate speech is dominated by offensive tweets, related to political and ideological issues. The share of unacceptable tweets is moderately increasing with time, from the initial 20% to 30% by the end of 2020. Unacceptable tweets are retweeted significantly more often than acceptable tweets. About 60% of unacceptable tweets are produced by a single right-wing community of only moderate size. Institutional Twitter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
