Offensive Language Analysis using Deep Learning Architecture
Ryan Ong

TL;DR
This paper explores deep learning models, especially BiLSTM-CNN, for offensive language detection in social media, experimenting with data balancing techniques and model variations to optimize performance.
Contribution
It systematically evaluates RNN-CNN variations and preprocessing methods, identifying the most effective architecture for offensive language classification.
Findings
BiLSTM-CNN achieves highest macro F1-score.
SMOTE and Class Weights improve data balance.
Adding CNN layers sometimes decreases performance.
Abstract
SemEval-2019 Task 6 (Zampieri et al., 2019b) requires us to identify and categorise offensive language in social media. In this paper we will describe the process we took to tackle this challenge. Our process is heavily inspired by Sosa (2017) where he proposed CNN-LSTM and LSTM-CNN models to conduct twitter sentiment analysis. We decided to follow his approach as well as further his work by testing out different variations of RNN models with CNN. Specifically, we have divided the challenge into two parts: data processing and sampling and choosing the optimal deep learning architecture. In preprocessing, we experimented with two techniques, SMOTE and Class Weights to counter the imbalance between classes. Once we are happy with the quality of our input data, we proceed to choosing the optimal deep learning architecture for this task. Given the quality and quantity of data we have been…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Adversarial Robustness in Machine Learning · Advanced Malware Detection Techniques
MethodsSynthetic Minority Over-sampling Technique.
