Predicting Different Types of Subtle Toxicity in Unhealthy Online   Conversations

Shlok Gilda; Mirela Silva; Luiz Giovanini; Daniela Oliveira

arXiv:2106.03952·cs.CL·January 28, 2022

Predicting Different Types of Subtle Toxicity in Unhealthy Online Conversations

Shlok Gilda, Mirela Silva, Luiz Giovanini, Daniela Oliveira

PDF

TL;DR

This study develops machine learning models to classify subtle forms of toxicity in online conversations, achieving high accuracy and revealing insights into the sentiment and detectability of different toxic behaviors.

Contribution

It introduces a comprehensive approach to detect various subtle toxicities in online comments using a public dataset and machine learning, with detailed performance metrics.

Findings

01

High classification performance with F1-score of 88.76%

02

Hostile comments are easier to detect than other types

03

Unhealthy comments generally have a slight negative sentiment

Abstract

This paper investigates the use of machine learning models for the classification of unhealthy online conversations containing one or more forms of subtler abuse, such as hostility, sarcasm, and generalization. We leveraged a public dataset of 44K online comments containing healthy and unhealthy comments labeled with seven forms of subtle toxicity. We were able to distinguish between these comments with a top micro F1-score, macro F1-score, and ROC-AUC of 88.76%, 67.98%, and 0.71, respectively. Hostile comments were easier to detect than other types of unhealthy comments. We also conducted a sentiment analysis which revealed that most types of unhealthy comments were associated with a slight negative sentiment, with hostile comments being the most negative ones.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.