Offensive Language and Hate Speech Detection for Danish
Gudbjartur Ingi Sigurbergsson, Leon Derczynski

TL;DR
This paper introduces a new Danish dataset for offensive language detection on social media, develops multilingual classification systems, and evaluates their performance in identifying offensive content and its targets.
Contribution
It presents the first Danish offensive language dataset, along with four multilingual detection systems and comprehensive evaluation results.
Findings
Best Danish offensive language detection F1-score: 0.70
Target detection F1-score for Danish: 0.73
Multilingual systems perform effectively across languages
Abstract
The presence of offensive language on social media platforms and the implications this poses is becoming a major concern in modern society. Given the enormous amount of content created every day, automatic methods are required to detect and deal with this type of content. Until now, most of the research has focused on solving the problem for the English language, while the problem is multilingual. We construct a Danish dataset containing user-generated comments from \textit{Reddit} and \textit{Facebook}. It contains user generated comments from various social media platforms, and to our knowledge, it is the first of its kind. Our dataset is annotated to capture various types and target of offensive language. We develop four automatic classification systems, each designed to work for both the English and the Danish language. In the detection of offensive language in English, the best…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Swearing, Euphemism, Multilingualism · Internet Traffic Analysis and Secure E-voting
