Data-Centric Human Preference with Rationales for Direct Preference Alignment

Hoang Anh Just; Ming Jin; Anit Sahu; Huy Phan; Ruoxi Jia

arXiv:2407.14477·cs.LG·July 15, 2025

Data-Centric Human Preference with Rationales for Direct Preference Alignment

Hoang Anh Just, Ming Jin, Anit Sahu, Huy Phan, Ruoxi Jia

PDF

Open Access 1 Repo 2 Datasets

TL;DR

This paper introduces a data-centric approach to improve language model alignment by augmenting human preference data with machine-generated rationales, leading to faster learning and better performance.

Contribution

It proposes a simple framework for enriching preference datasets with rationales, enhancing learning efficiency and compatibility with existing preference optimization algorithms.

Findings

01

Rationale-augmented learning accelerates convergence.

02

Enriching data with rationales improves final model performance.

03

The approach is versatile across different preference optimization methods.

Abstract

Aligning language models with human preferences through reinforcement learning from human feedback is crucial for their safe and effective deployment. The human preference is typically represented through comparison where one response is chosen over another for a given prompt. However, standard preference datasets often lack explicit information on why a particular choice was made, presenting an ambiguity that can hinder efficient learning and robust alignment, especially given the high cost of acquiring extensive human annotations. While many studies focus on algorithmic improvements, this work adopts a data-centric perspective, exploring how to enhance learning from existing preference data. We propose augmenting standard preference pairs with rationales that explain the reasoning behind the human preference. Specifically, we introduce a simple and principled framework that leverages…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

reds-lab/preference-learning-with-rationales
pytorchOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Criteria Decision Making · Data Management and Algorithms

MethodsFocus