Improving Classifier Training Efficiency for Automatic Cyberbullying Detection with Feature Density
Juuso Eronen, Michal Ptaszynski, Fumito Masui, Aleksander, Smywi\'nski-Pohl, Gniewosz Leliwa, Michal Wroczynski

TL;DR
This paper introduces a method using Feature Density to estimate dataset complexity and predict classifier performance, aiming to reduce training iterations and resource consumption in cyberbullying detection across multiple languages.
Contribution
It proposes a novel approach to estimate dataset complexity with Feature Density, optimizing classifier training efficiency for multilingual cyberbullying detection tasks.
Findings
Feature Density effectively estimates dataset complexity.
Linguistically-backed preprocessing improves model performance.
Method reduces training iterations and computational resources.
Abstract
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods in order to estimate dataset complexity, which in turn is used to comparatively estimate the potential performance of machine learning (ML) classifiers prior to any training. We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments iterations. This way we can optimize the resource-intensive training of ML models which is becoming a serious issue due to the increases in available dataset sizes and the ever rising popularity of models based on Deep Neural Networks (DNN). The problem of constantly increasing needs for more powerful computational resources is also affecting the environment due to alarmingly-growing amount of CO2 emissions caused by training of large-scale ML models. The research was conducted on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
