Robustness and Confounders in the Demographic Alignment of LLMs with   Human Perceptions of Offensiveness

Shayan Alipour; Indira Sen; Mattia Samory; Tanushree Mitra

arXiv:2411.08977·cs.CY·November 25, 2024

Robustness and Confounders in the Demographic Alignment of LLMs with Human Perceptions of Offensiveness

Shayan Alipour, Indira Sen, Mattia Samory, Tanushree Mitra

PDF

Open Access 1 Repo

TL;DR

This study systematically evaluates demographic biases in LLMs' alignment with human perceptions of offensiveness across multiple datasets, highlighting the influence of confounders like document difficulty and annotator sensitivity.

Contribution

It introduces a comprehensive, confounder-aware analysis of demographic biases in LLMs' offensive language detection across diverse datasets.

Findings

01

Demographic traits, especially race, influence alignment but vary across datasets.

02

Confounders such as annotator sensitivity and document difficulty explain more variation than demographics.

03

Alignment increases with annotator sensitivity and group agreement, decreases with document difficulty.

Abstract

Large language models (LLMs) are known to exhibit demographic biases, yet few studies systematically evaluate these biases across multiple datasets or account for confounding factors. In this work, we examine LLM alignment with human annotations in five offensive language datasets, comprising approximately 220K annotations. Our findings reveal that while demographic traits, particularly race, influence alignment, these effects are inconsistent across datasets and often entangled with other factors. Confounders -- such as document difficulty, annotator sensitivity, and within-group agreement -- account for more variation in alignment patterns than demographic traits alone. Specifically, alignment increases with higher annotator sensitivity and group agreement, while greater document difficulty corresponds to reduced alignment. Our results underscore the importance of multi-dataset…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shayanalipour/llm-alignment-bias
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOpen Source Software Innovations