Towards non-toxic landscapes: Automatic toxic comment detection using DNN
Ashwin Geet D'Sa, Irina Illina, Dominique Fohr

TL;DR
This paper develops and compares deep neural network models, including BERT fine-tuning, for automatic detection of toxic comments in online media, addressing the challenge of defining and identifying toxic speech.
Contribution
It introduces binary classification and regression approaches for toxic comment detection and evaluates their robustness against adversarial word additions.
Findings
BERT fine-tuning outperforms other word representations.
The proposed models effectively identify toxic comments.
Robustness to adversarial attacks varies with model type.
Abstract
The spectacular expansion of the Internet has led to the development of a new research problem in the field of natural language processing: automatic toxic comment detection, since many countries prohibit hate speech in public media. There is no clear and formal definition of hate, offensive, toxic and abusive speeches. In this article, we put all these terms under the umbrella of "toxic" speech. The contribution of this paper is the design of binary classification and regression-based approaches aiming to predict whether a comment is toxic or not. We compare different unsupervised word representations and different DNN based classifiers. Moreover, we study the robustness of the proposed approaches to adversarial attacks by adding one (healthy or toxic) word. We evaluate the proposed methodology on the English Wikipedia Detox corpus. Our experiments show that using BERT fine-tuning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Advanced Malware Detection Techniques
MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece · Softmax
