How Humans Help LLMs: Assessing and Incentivizing Human Preference Annotators

Shang Liu; Hanzhao Wang; Zhongyao Ma; Xiaocheng Li

arXiv:2502.06387·cs.LG·April 8, 2026

How Humans Help LLMs: Assessing and Incentivizing Human Preference Annotators

Shang Liu, Hanzhao Wang, Zhongyao Ma, Xiaocheng Li

PDF

TL;DR

This paper investigates methods to monitor and incentivize human annotators for preference data in LLM training, proposing a self-consistency scheme and analyzing sample complexity for effective quality control.

Contribution

It introduces a self-consistency monitoring method tailored for preference annotation and analyzes the sample complexity of incentivization contracts.

Findings

01

Self-consistency monitoring can outperform expert-based monitoring under certain conditions.

02

Linear contracts are rate-optimal among general contracts for incentivizing annotators.

03

The sample complexity for effective incentivization scales as 1/(I n) for linear contracts.

Abstract

Human-annotated preference data play an important role in aligning large language models (LLMs). In this paper, we study two connected questions: how to monitor the quality of human preference annotators and how to incentivize them to provide high-quality annotations. In current practice, expert-based monitoring is a natural workhorse for quality control, but it performs poorly in preference annotation because annotators are heterogeneous and downstream model performance is an indirect and noisy proxy for annotation quality. We therefore propose a self-consistency monitoring scheme tailored to preference annotation, and analyze the statistical sample complexity of both methods. This practitioner-facing analysis identifies how many inspected samples are needed to reliably assess an annotator and shows when self-consistency monitoring can outperform expert-based monitoring. We then use…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.