A Dictionary-based Approach to Racism Detection in Dutch Social Media

St\'ephan Tulkens; Lisa Hilte; Elise Lodewyckx; Ben Verhoeven; Walter; Daelemans

arXiv:1608.08738·cs.CL·September 1, 2016·75 cites

A Dictionary-based Approach to Racism Detection in Dutch Social Media

St\'ephan Tulkens, Lisa Hilte, Elise Lodewyckx, Ben Verhoeven, Walter, Daelemans

PDF

Open Access 1 Repo

TL;DR

This paper introduces a dictionary-based method for detecting racist comments in Dutch social media, utilizing manually curated and automatically expanded dictionaries with machine learning, achieving moderate classification performance.

Contribution

It presents a novel approach combining discourse dictionaries and Support Vector Machines for racism detection in Dutch social media comments, including a manual filtering process for dictionary expansion.

Findings

01

Best model achieved an F-score of 0.46 for racist comments.

02

Automated dictionary expansion did not significantly improve performance.

03

Coverage of expanded dictionaries increased but did not enhance classification accuracy.

Abstract

We present a dictionary-based approach to racism detection in Dutch social media comments, which were retrieved from two public Belgian social media sites likely to attract racist reactions. These comments were labeled as racist or non-racist by multiple annotators. For our approach, three discourse dictionaries were created: first, we created a dictionary by retrieving possibly racist and more neutral terms from the training data, and then augmenting these with more general words to remove some bias. A second dictionary was created through automatic expansion using a \texttt{word2vec} model trained on a large corpus of general Dutch text. Finally, a third dictionary was created by manually filtering out incorrect expansions. We trained multiple Support Vector Machines, using the distribution of words over the different categories in the dictionaries as features. The best-performing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

clips/hades
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Authorship Attribution and Profiling · Hate Speech and Cyberbullying Detection