Harnessing Artificial Intelligence to Combat Online Hate: Exploring the   Challenges and Opportunities of Large Language Models in Hate Speech   Detection

Tharindu Kumarage; Amrita Bhattacharjee; Joshua Garland

arXiv:2403.08035·cs.CL·March 14, 2024·6 cites

Harnessing Artificial Intelligence to Combat Online Hate: Exploring the Challenges and Opportunities of Large Language Models in Hate Speech Detection

Tharindu Kumarage, Amrita Bhattacharjee, Joshua Garland

PDF

Open Access

TL;DR

This paper reviews the use of large language models for hate speech detection, analyzing their effectiveness and challenges through literature review and empirical testing to understand their capabilities and limitations.

Contribution

It provides a comprehensive review combined with empirical analysis of LLMs' performance in hate speech detection, highlighting key factors influencing their effectiveness.

Findings

01

Certain LLMs outperform others in hate speech classification

02

Training data and model attributes significantly impact detection accuracy

03

The study identifies key challenges and opportunities in deploying LLMs for this task

Abstract

Large language models (LLMs) excel in many diverse applications beyond language generation, e.g., translation, summarization, and sentiment analysis. One intriguing application is in text classification. This becomes pertinent in the realm of identifying hateful or toxic speech -- a domain fraught with challenges and ethical dilemmas. In our study, we have two objectives: firstly, to offer a literature review revolving around LLMs as classifiers, emphasizing their role in detecting and classifying hateful or toxic content. Subsequently, we explore the efficacy of several LLMs in classifying hate speech: identifying which LLMs excel in this task as well as their underlying attributes and training. Providing insight into the factors that contribute to an LLM proficiency (or lack thereof) in discerning hateful content. By combining a comprehensive literature review with an empirical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection