How Humans Help LLMs: Assessing and Incentivizing Human Preference Annotators
Shang Liu, Hanzhao Wang, Zhongyao Ma, Xiaocheng Li

TL;DR
This paper investigates methods to monitor and incentivize human annotators for preference data in LLM training, proposing a self-consistency scheme and analyzing sample complexity for effective quality control.
Contribution
It introduces a self-consistency monitoring method tailored for preference annotation and analyzes the sample complexity of incentivization contracts.
Findings
Self-consistency monitoring can outperform expert-based monitoring under certain conditions.
Linear contracts are rate-optimal among general contracts for incentivizing annotators.
The sample complexity for effective incentivization scales as 1/(I n) for linear contracts.
Abstract
Human-annotated preference data play an important role in aligning large language models (LLMs). In this paper, we study two connected questions: how to monitor the quality of human preference annotators and how to incentivize them to provide high-quality annotations. In current practice, expert-based monitoring is a natural workhorse for quality control, but it performs poorly in preference annotation because annotators are heterogeneous and downstream model performance is an indirect and noisy proxy for annotation quality. We therefore propose a self-consistency monitoring scheme tailored to preference annotation, and analyze the statistical sample complexity of both methods. This practitioner-facing analysis identifies how many inspected samples are needed to reliably assess an annotator and shows when self-consistency monitoring can outperform expert-based monitoring. We then use…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
