Annotation alignment: Comparing LLM and human annotations of   conversational safety

Rajiv Movva; Pang Wei Koh; Emma Pierson

arXiv:2406.06369·cs.CL·October 8, 2024

Annotation alignment: Comparing LLM and human annotations of conversational safety

Rajiv Movva, Pang Wei Koh, Emma Pierson

PDF

Open Access

TL;DR

This study compares GPT-4's safety annotations with human judgments across diverse conversations, revealing moderate alignment and highlighting the need for larger datasets to understand demographic disparities.

Contribution

The paper introduces an analysis of LLM-human annotation alignment on safety, using a large, diverse dataset to identify limitations and variability in demographic group correlations.

Findings

01

GPT-4 correlates with human safety ratings at r=0.59

02

Larger datasets are needed to assess demographic disparities

03

GPT-4 cannot predict safety differences across demographic groups

Abstract

Do LLMs align with human perceptions of safety? We study this question via annotation alignment, the extent to which LLMs and humans agree when annotating the safety of user-chatbot conversations. We leverage the recent DICES dataset (Aroyo et al., 2023), in which 350 conversations are each rated for safety by 112 annotators spanning 10 race-gender groups. GPT-4 achieves a Pearson correlation of $r = 0.59$ with the average annotator rating, \textit{higher} than the median annotator's correlation with the average ( $r = 0.51$ ). We show that larger datasets are needed to resolve whether LLMs exhibit disparities in how well they correlate with different demographic groups. Also, there is substantial idiosyncratic variation in correlation within groups, suggesting that race & gender do not fully capture differences in alignment. Finally, we find that GPT-4 cannot predict when one demographic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOccupational Health and Safety Research · Safety Warnings and Signage · Software Engineering Research

MethodsAttention Is All You Need · Softmax · ALIGN · Layer Normalization · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Multi-Head Attention