Investigating Annotator Bias in Large Language Models for Hate Speech   Detection

Amit Das; Zheng Zhang; Najib Hasan; Souvika Sarkar; Fatemeh Jamshidi,; Tathagata Bhattacharya; Mostafa Rahgouy; Nilanjana Raychawdhary; Dongji Feng,; Vinija Jain; Aman Chadha; Mary Sandage; Lauramarie Pope; Gerry Dozier; Cheryl; Seals

arXiv:2406.11109·cs.CL·November 19, 2024·2 cites

Investigating Annotator Bias in Large Language Models for Hate Speech Detection

Amit Das, Zheng Zhang, Najib Hasan, Souvika Sarkar, Fatemeh Jamshidi,, Tathagata Bhattacharya, Mostafa Rahgouy, Nilanjana Raychawdhary, Dongji Feng,, Vinija Jain, Aman Chadha, Mary Sandage, Lauramarie Pope, Gerry Dozier, Cheryl, Seals

PDF

Open Access 3 Repos 1 Datasets

TL;DR

This paper investigates biases in large language models when annotating hate speech, focusing on gender, race, religion, and disability, and introduces a new dataset to analyze these biases comprehensively.

Contribution

It provides a detailed analysis of LLM biases in hate speech annotation across four categories and introduces HateBiasNet, a new dataset for bias research.

Findings

01

LLMs exhibit significant biases in hate speech annotation.

02

Bias varies across different LLMs and demographic categories.

03

The study offers insights for improving LLM-based annotation methods.

Abstract

Data annotation, the practice of assigning descriptive labels to raw data, is pivotal in optimizing the performance of machine learning models. However, it is a resource-intensive process susceptible to biases introduced by annotators. The emergence of sophisticated Large Language Models (LLMs) presents a unique opportunity to modernize and streamline this complex procedure. While existing research extensively evaluates the efficacy of LLMs, as annotators, this paper delves into the biases present in LLMs when annotating hate speech data. Our research contributes to understanding biases in four key categories: gender, race, religion, and disability with four LLMs: GPT-3.5, GPT-4o, Llama-3.1 and Gemma-2. Specifically targeting highly vulnerable groups within these categories, we analyze annotator biases. Furthermore, we conduct a comprehensive examination of potential factors…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

AmitDasRup123/HateSpeechCorpus
dataset· 24 dl
24 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · {Dispute@FaQ-s}How to file a dispute with Expedia? · GPT-3 · Cosine Annealing · Byte Pair Encoding · Attention Dropout · Weight Decay · Dropout