CoRAL: a Context-aware Croatian Abusive Language Dataset
Ravi Shekhar, Mladen Karan, Matthew Purver

TL;DR
This paper introduces CoRAL, a Croatian dataset for abusive language detection that considers implicit and context-dependent content, highlighting challenges faced by current models in understanding subtle, culturally nuanced comments.
Contribution
The paper presents CoRAL, a novel, culturally aware Croatian abusive language dataset focusing on implicit and context-dependent content, and evaluates model performance on it.
Findings
Models perform worse on implicit comments
Context and language skill are crucial for accurate detection
Current models struggle with culturally nuanced content
Abstract
In light of unprecedented increases in the popularity of the internet and social media, comment moderation has never been a more relevant task. Semi-automated comment moderation systems greatly aid human moderators by either automatically classifying the examples or allowing the moderators to prioritize which comments to consider first. However, the concept of inappropriate content is often subjective, and such content can be conveyed in many subtle and indirect ways. In this work, we propose CoRAL -- a language and culturally aware Croatian Abusive dataset covering phenomena of implicitness and reliance on local and global context. We show experimentally that current models degrade when comments are not explicit and further degrade when language skill and context knowledge are required to interpret the comment.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
MethodsCorrelation Alignment for Deep Domain Adaptation · Attentive Walk-Aggregating Graph Neural Network
