# The FRENK Datasets of Socially Unacceptable Discourse in Slovene and   English

**Authors:** Nikola Ljube\v{s}i\'c, Darja Fi\v{s}er, Toma\v{z} Erjavec

arXiv: 1906.02045 · 2019-06-14

## TL;DR

This paper introduces bilingual datasets of Facebook comments in Slovene and English, annotated for socially unacceptable discourse related to migrants and LGBT topics, facilitating cross-lingual analysis and understanding of SUD.

## Contribution

The paper presents novel, comparable datasets with a detailed annotation schema for SUD, enabling cross-lingual research and analysis in Slovene and English.

## Key findings

- Datasets cover two sensitive topics, migrants and LGBT.
- Annotation schema includes six types of SUD and five targets.
- Inter-annotator agreement analysis supports dataset reliability.

## Abstract

In this paper we present datasets of Facebook comment threads to mainstream media posts in Slovene and English developed inside the Slovene national project FRENK which cover two topics, migrants and LGBT, and are manually annotated for different types of socially unacceptable discourse (SUD). The main advantages of these datasets compared to the existing ones are identical sampling procedures, producing comparable data across languages and an annotation schema that takes into account six types of SUD and five targets at which SUD is directed. We describe the sampling and annotation procedures, and analyze the annotation distributions and inter-annotator agreements. We consider this dataset to be an important milestone in understanding and combating SUD for both languages.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.02045/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/1906.02045/full.md

## References

9 references — full list in the complete paper: https://tomesphere.com/paper/1906.02045/full.md

---
Source: https://tomesphere.com/paper/1906.02045