Machine Learning Suites for Online Toxicity Detection
David Noever

TL;DR
This paper systematically evaluates 62 classifiers across 19 algorithmic families for online toxicity detection, highlighting the effectiveness of tree-based models and simple bad word lists in classifying toxic comments.
Contribution
It provides a comprehensive comparison of classifiers and features for toxicity detection, emphasizing the interpretability and predictive power of tree-based algorithms and basic bad word lists.
Findings
Tree-based classifiers are most explainable and effective.
Simple bad word lists are highly predictive of toxicity.
Certain features like syntax and sentiment significantly contribute to detection.
Abstract
To identify and classify toxic online commentary, the modern tools of data science transform raw text into key features from which either thresholding or learning algorithms can make predictions for monitoring offensive conversations. We systematically evaluate 62 classifiers representing 19 major algorithmic families against features extracted from the Jigsaw dataset of Wikipedia comments. We compare the classifiers based on statistically significant differences in accuracy and relative execution time. Among these classifiers for identifying toxic comments, tree-based algorithms provide the most transparently explainable rules and rank-order the predictive contribution of each feature. Among 28 features of syntax, sentiment, emotion and outlier word dictionaries, a simple bad word list proves most predictive of offensive commentary.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Advanced Malware Detection Techniques · Adversarial Robustness in Machine Learning
MethodsJigsaw
