Towards Comprehensive Preference Data Collection for Reward Modeling
Yulan Hu, Qingyang Li, Sheng Ouyang, Ge Chen, Kaihui Chen, Lijun Mei,, Xucheng Ye, Fuzheng Zhang, Yong Liu

TL;DR
This paper introduces a structured, four-step framework for collecting high-quality preference data in reinforcement learning from human feedback, aiming to improve reward models for language models.
Contribution
It proposes a novel comprehensive framework for preference data collection, decomposing the process into four steps to enhance data quality and reduce human labor reliance.
Findings
The framework improves the quality of preference data.
Experiments show the effectiveness of the proposed method.
Structured data collection enhances reward model training.
Abstract
Reinforcement Learning from Human Feedback (RLHF) facilitates the alignment of large language models (LLMs) with human preferences, thereby enhancing the quality of responses generated. A critical component of RLHF is the reward model, which is trained on preference data and outputs a scalar reward during the inference stage. However, the collection of preference data still lacks thorough investigation. Recent studies indicate that preference data is collected either by AI or humans, where chosen and rejected instances are identified among pairwise responses. We question whether this process effectively filters out noise and ensures sufficient diversity in collected data. To address these concerns, for the first time, we propose a comprehensive framework for preference data collection, decomposing the process into four incremental steps: Prompt Generation, Response Generation, Response…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms
