"HOT" ChatGPT: The promise of ChatGPT in detecting and discriminating   hateful, offensive, and toxic comments on social media

Lingyao Li; Lizhou Fan; Shubham Atreja; Libby Hemphill

arXiv:2304.10619·cs.CL·April 29, 2024·20 cites

"HOT" ChatGPT: The promise of ChatGPT in detecting and discriminating hateful, offensive, and toxic comments on social media

Lingyao Li, Lizhou Fan, Shubham Atreja, Libby Hemphill

PDF

Open Access 1 Repo

TL;DR

This study evaluates ChatGPT's ability to detect harmful social media comments, showing it achieves around 80% accuracy and highlighting the influence of prompts on its performance.

Contribution

It demonstrates ChatGPT's potential for harmful content detection and analyzes how prompt design affects its classification accuracy and consistency.

Findings

01

ChatGPT achieves ~80% accuracy compared to human annotations.

02

It classifies non-harmful comments more consistently than harmful ones.

03

Prompt choice significantly impacts ChatGPT's detection performance.

Abstract

Harmful content is pervasive on social media, poisoning online communities and negatively impacting participation. A common approach to address this issue is to develop detection models that rely on human annotations. However, the tasks required to build such models expose annotators to harmful and offensive content and may require significant time and cost to complete. Generative AI models have the potential to understand and detect harmful content. To investigate this potential, we used ChatGPT and compared its performance with MTurker annotations for three frequently discussed concepts related to harmful content: Hateful, Offensive, and Toxic (HOT). We designed five prompts to interact with ChatGPT and conducted four experiments eliciting HOT classifications. Our results show that ChatGPT can achieve an accuracy of approximately 80% when compared to MTurker annotations. Specifically,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cactilab/hateguard
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Misinformation and Its Impacts

MethodsALIGN