Initial Study into Application of Feature Density and   Linguistically-backed Embedding to Improve Machine Learning-based   Cyberbullying Detection

Juuso Eronen; Michal Ptaszynski; Fumito Masui; Gniewosz Leliwa; Michal; Wroczynski; Mateusz Piech; Aleksander Smywinski-Pohl

arXiv:2206.01889·cs.CL·June 7, 2022·1 cites

Initial Study into Application of Feature Density and Linguistically-backed Embedding to Improve Machine Learning-based Cyberbullying Detection

Juuso Eronen, Michal Ptaszynski, Fumito Masui, Gniewosz Leliwa, Michal, Wroczynski, Mateusz Piech, Aleksander Smywinski-Pohl

PDF

Open Access

TL;DR

This study explores how linguistic preprocessing and feature density influence machine learning performance in cyberbullying detection, introducing linguistically-backed embeddings for CNNs and confirming the predictive value of feature density.

Contribution

It introduces a new approach of training linguistically-backed embeddings for CNNs and demonstrates the correlation between feature density and classifier performance.

Findings

01

Neural networks effectively detect cyberbullying.

02

Feature density correlates with classifier performance.

03

Linguistically-backed embeddings improve CNN accuracy.

Abstract

In this research, we study the change in the performance of machine learning (ML) classifiers when various linguistic preprocessing methods of a dataset were used, with the specific focus on linguistically-backed embeddings in Convolutional Neural Networks (CNN). Moreover, we study the concept of Feature Density and confirm its potential to comparatively predict the performance of ML classifiers, including CNN. The research was conducted on a Formspring dataset provided in a Kaggle competition on automatic cyberbullying detection. The dataset was re-annotated by objective experts (psychologists), as the importance of professional annotation in cyberbullying research has been indicated multiple times. The study confirmed the effectiveness of Neural Networks in cyberbullying detection and the correlation between classifier performance and Feature Density while also proposing a new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection