Mitigating Biases to Embrace Diversity: A Comprehensive Annotation   Benchmark for Toxic Language

Xinmeng Hou

arXiv:2410.13313·cs.CL·October 18, 2024

Mitigating Biases to Embrace Diversity: A Comprehensive Annotation Benchmark for Toxic Language

Xinmeng Hou

PDF

Open Access

TL;DR

This paper presents a new annotation benchmark for toxic language that reduces bias, improves annotation consistency, and demonstrates the effectiveness of LLMs as annotation tools, especially when data is limited or diverse language is involved.

Contribution

It introduces two annotated datasets with higher agreement and shows that fine-tuned smaller models on multi-source LLM data outperform larger models trained on single-source human data.

Findings

01

LLMs can effectively replace human annotators.

02

Structured guidelines reduce subjective variability.

03

Smaller models with multi-source data outperform larger single-source models.

Abstract

This study introduces a prescriptive annotation benchmark grounded in humanities research to ensure consistent, unbiased labeling of offensive language, particularly for casual and non-mainstream language uses. We contribute two newly annotated datasets that achieve higher inter-annotator agreement between human and language model (LLM) annotations compared to original datasets based on descriptive instructions. Our experiments show that LLMs can serve as effective alternatives when professional annotators are unavailable. Moreover, smaller models fine-tuned on multi-source LLM-annotated data outperform models trained on larger, single-source human-annotated datasets. These findings highlight the value of structured guidelines in reducing subjective variability, maintaining performance with limited data, and embracing language diversity. Content Warning: This article only analyzes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection