Data-Centric Human Preference with Rationales for Direct Preference Alignment
Hoang Anh Just, Ming Jin, Anit Sahu, Huy Phan, Ruoxi Jia

TL;DR
This paper introduces a data-centric approach to improve language model alignment by augmenting human preference data with machine-generated rationales, leading to faster learning and better performance.
Contribution
It proposes a simple framework for enriching preference datasets with rationales, enhancing learning efficiency and compatibility with existing preference optimization algorithms.
Findings
Rationale-augmented learning accelerates convergence.
Enriching data with rationales improves final model performance.
The approach is versatile across different preference optimization methods.
Abstract
Aligning language models with human preferences through reinforcement learning from human feedback is crucial for their safe and effective deployment. The human preference is typically represented through comparison where one response is chosen over another for a given prompt. However, standard preference datasets often lack explicit information on why a particular choice was made, presenting an ambiguity that can hinder efficient learning and robust alignment, especially given the high cost of acquiring extensive human annotations. While many studies focus on algorithmic improvements, this work adopts a data-centric perspective, exploring how to enhance learning from existing preference data. We propose augmenting standard preference pairs with rationales that explain the reasoning behind the human preference. Specifically, we introduce a simple and principled framework that leverages…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Criteria Decision Making · Data Management and Algorithms
MethodsFocus
