"HOT" ChatGPT: The promise of ChatGPT in detecting and discriminating hateful, offensive, and toxic comments on social media
Lingyao Li, Lizhou Fan, Shubham Atreja, Libby Hemphill

TL;DR
This study evaluates ChatGPT's ability to detect harmful social media comments, showing it achieves around 80% accuracy and highlighting the influence of prompts on its performance.
Contribution
It demonstrates ChatGPT's potential for harmful content detection and analyzes how prompt design affects its classification accuracy and consistency.
Findings
ChatGPT achieves ~80% accuracy compared to human annotations.
It classifies non-harmful comments more consistently than harmful ones.
Prompt choice significantly impacts ChatGPT's detection performance.
Abstract
Harmful content is pervasive on social media, poisoning online communities and negatively impacting participation. A common approach to address this issue is to develop detection models that rely on human annotations. However, the tasks required to build such models expose annotators to harmful and offensive content and may require significant time and cost to complete. Generative AI models have the potential to understand and detect harmful content. To investigate this potential, we used ChatGPT and compared its performance with MTurker annotations for three frequently discussed concepts related to harmful content: Hateful, Offensive, and Toxic (HOT). We designed five prompts to interact with ChatGPT and conducted four experiments eliciting HOT classifications. Our results show that ChatGPT can achieve an accuracy of approximately 80% when compared to MTurker annotations. Specifically,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Misinformation and Its Impacts
MethodsALIGN
