Predicting Different Types of Subtle Toxicity in Unhealthy Online Conversations
Shlok Gilda, Mirela Silva, Luiz Giovanini, Daniela Oliveira

TL;DR
This study develops machine learning models to classify subtle forms of toxicity in online conversations, achieving high accuracy and revealing insights into the sentiment and detectability of different toxic behaviors.
Contribution
It introduces a comprehensive approach to detect various subtle toxicities in online comments using a public dataset and machine learning, with detailed performance metrics.
Findings
High classification performance with F1-score of 88.76%
Hostile comments are easier to detect than other types
Unhealthy comments generally have a slight negative sentiment
Abstract
This paper investigates the use of machine learning models for the classification of unhealthy online conversations containing one or more forms of subtler abuse, such as hostility, sarcasm, and generalization. We leveraged a public dataset of 44K online comments containing healthy and unhealthy comments labeled with seven forms of subtle toxicity. We were able to distinguish between these comments with a top micro F1-score, macro F1-score, and ROC-AUC of 88.76%, 67.98%, and 0.71, respectively. Hostile comments were easier to detect than other types of unhealthy comments. We also conducted a sentiment analysis which revealed that most types of unhealthy comments were associated with a slight negative sentiment, with hostile comments being the most negative ones.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
