Comparative Studies of Detecting Abusive Language on Twitter

Younghun Lee; Seunghyun Yoon; Kyomin Jung

arXiv:1808.10245·cs.CL·August 31, 2018

Comparative Studies of Detecting Abusive Language on Twitter

Younghun Lee, Seunghyun Yoon, Kyomin Jung

PDF

4 Repos

TL;DR

This paper conducts a comprehensive comparison of different models for detecting abusive language on Twitter, utilizing a large, reliable dataset and exploring additional features to improve accuracy.

Contribution

It is the first to systematically compare models on the Hate and Abusive Speech on Twitter dataset and assess the impact of extra features and context data.

Findings

01

Bidirectional GRU with Latent Topic Clustering achieves 0.805 F1 score.

02

Using additional features and context data can enhance model performance.

03

The study highlights the dataset's potential for advancing abusive language detection.

Abstract

The context-dependent nature of online aggression makes annotating large collections of data extremely difficult. Previously studied datasets in abusive language detection have been insufficient in size to efficiently train deep learning models. Recently, Hate and Abusive Speech on Twitter, a dataset much greater in size and reliability, has been released. However, this dataset has not been comprehensively studied to its potential. In this paper, we conduct the first comparative study of various learning models on Hate and Abusive Speech on Twitter, and discuss the possibility of using additional features and context data for improvements. Experimental results show that bidirectional GRU networks trained on word-level features, with Latent Topic Clustering modules, is the most accurate model scoring 0.805 F1.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.