Diverging Preferences: When do Annotators Disagree and do Models Know?

Michael JQ Zhang; Zhilin Wang; Jena D. Hwang; Yi Dong; Olivier Delalleau; Yejin Choi; Eunsol Choi; Xiang Ren; Valentina Pyatkin

arXiv:2410.14632·cs.CL·March 4, 2026

Diverging Preferences: When do Annotators Disagree and do Models Know?

Michael JQ Zhang, Zhilin Wang, Jena D. Hwang, Yi Dong, Olivier Delalleau, Yejin Choi, Eunsol Choi, Xiang Ren, Valentina Pyatkin

PDF

Open Access

TL;DR

This paper investigates the sources of disagreement among human annotators in preference datasets, revealing that disagreements often stem from factors like task ambiguity and response styles, which impact reward modeling and evaluation of language models.

Contribution

It introduces a taxonomy of disagreement sources, challenges assumptions about annotator noise, and proposes methods to identify and mitigate diverging preferences in LLM training and evaluation.

Findings

01

Most disagreements arise from task underspecification and response styles.

02

Standard reward modeling and evaluation methods often fail to account for diverging preferences.

03

Proposed methods help identify and mitigate the influence of diverging preferences.

Abstract

We examine diverging preferences in human-labeled preference datasets. We develop a taxonomy of disagreement sources spanning ten categories across four high-level classes and find that the majority of disagreements are due to factors such as task underspecification or response style. Our findings challenge a standard assumption in reward modeling methods that annotator disagreements can be attributed to simple noise. We then explore how these findings impact two areas of LLM development: reward modeling training and evaluation. In our experiments, we demonstrate how standard reward modeling (e.g., Bradley-Terry) and LLM-as-Judge evaluation methods fail to account for divergence between annotators. These findings highlight challenges in LLM evaluations, which are greatly influenced by divisive features like response style, and in developing pluralistically aligned LLMs. To address these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems