Offensive Language Analysis using Deep Learning Architecture

Ryan Ong

arXiv:1903.05280·cs.CL·March 20, 2019·5 cites

Offensive Language Analysis using Deep Learning Architecture

Ryan Ong

PDF

Open Access 1 Repo

TL;DR

This paper explores deep learning models, especially BiLSTM-CNN, for offensive language detection in social media, experimenting with data balancing techniques and model variations to optimize performance.

Contribution

It systematically evaluates RNN-CNN variations and preprocessing methods, identifying the most effective architecture for offensive language classification.

Findings

01

BiLSTM-CNN achieves highest macro F1-score.

02

SMOTE and Class Weights improve data balance.

03

Adding CNN layers sometimes decreases performance.

Abstract

SemEval-2019 Task 6 (Zampieri et al., 2019b) requires us to identify and categorise offensive language in social media. In this paper we will describe the process we took to tackle this challenge. Our process is heavily inspired by Sosa (2017) where he proposed CNN-LSTM and LSTM-CNN models to conduct twitter sentiment analysis. We decided to follow his approach as well as further his work by testing out different variations of RNN models with CNN. Specifically, we have divided the challenge into two parts: data processing and sampling and choosing the optimal deep learning architecture. In preprocessing, we experimented with two techniques, SMOTE and Class Weights to counter the imbalance between classes. Once we are happy with the quality of our input data, we proceed to choosing the optimal deep learning architecture for this task. Given the quality and quantity of data we have been…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RyanOngAI/semeval-2019-task6
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Adversarial Robustness in Machine Learning · Advanced Malware Detection Techniques

MethodsSynthetic Minority Over-sampling Technique.