Towards Data-Centric RLHF: Simple Metrics for Preference Dataset   Comparison

Judy Hanwen Shen; Archit Sharma; Jun Qin

arXiv:2409.09603·cs.AI·September 17, 2024

Towards Data-Centric RLHF: Simple Metrics for Preference Dataset Comparison

Judy Hanwen Shen, Archit Sharma, Jun Qin

PDF

Open Access

TL;DR

This paper introduces metrics to evaluate and compare preference datasets used in RLHF, focusing on scale, noise, and information content, to improve data-centric alignment strategies.

Contribution

It proposes specific metrics for assessing preference datasets, enabling systematic comparison and better understanding for RLHF data collection and training.

Findings

01

Metrics reveal differences in dataset quality and usefulness.

02

Preference datasets vary significantly in scale, noise, and information content.

03

Guidelines for selecting and improving preference datasets are provided.

Abstract

The goal of aligning language models to human preferences requires data that reveal these preferences. Ideally, time and money can be spent carefully collecting and tailoring bespoke preference data to each downstream application. However, in practice, a select few publicly available preference datasets are often used to train reward models for reinforcement learning from human feedback (RLHF). While new preference datasets are being introduced with increasing frequency, there are currently no existing efforts to measure and compare these datasets. In this paper, we systematically study preference datasets through three perspectives: scale, label noise, and information content. We propose specific metrics for each of these perspectives and uncover different axes of comparison for a better understanding of preference datasets. Our work is a first step towards a data-centric approach to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms