Offensive Language and Hate Speech Detection for Danish

Gudbjartur Ingi Sigurbergsson; Leon Derczynski

arXiv:1908.04531·cs.CL·March 24, 2023·23 cites

Offensive Language and Hate Speech Detection for Danish

Gudbjartur Ingi Sigurbergsson, Leon Derczynski

PDF

Open Access 2 Datasets

TL;DR

This paper introduces a new Danish dataset for offensive language detection on social media, develops multilingual classification systems, and evaluates their performance in identifying offensive content and its targets.

Contribution

It presents the first Danish offensive language dataset, along with four multilingual detection systems and comprehensive evaluation results.

Findings

01

Best Danish offensive language detection F1-score: 0.70

02

Target detection F1-score for Danish: 0.73

03

Multilingual systems perform effectively across languages

Abstract

The presence of offensive language on social media platforms and the implications this poses is becoming a major concern in modern society. Given the enormous amount of content created every day, automatic methods are required to detect and deal with this type of content. Until now, most of the research has focused on solving the problem for the English language, while the problem is multilingual. We construct a Danish dataset containing user-generated comments from \textit{Reddit} and \textit{Facebook}. It contains user generated comments from various social media platforms, and to our knowledge, it is the first of its kind. Our dataset is annotated to capture various types and target of offensive language. We develop four automatic classification systems, each designed to work for both the English and the Danish language. In the detection of offensive language in English, the best…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Swearing, Euphemism, Multilingualism · Internet Traffic Analysis and Secure E-voting