Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs

Chris Yuhao Liu; Liang Zeng; Jiacai Liu; Rui Yan; Jujie He; Chaojie; Wang; Shuicheng Yan; Yang Liu; Yahui Zhou

arXiv:2410.18451·cs.AI·October 25, 2024

Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs

Chris Yuhao Liu, Liang Zeng, Jiacai Liu, Rui Yan, Jujie He, Chaojie, Wang, Shuicheng Yan, Yang Liu, Yahui Zhou

PDF

Open Access 4 Models 5 Datasets

TL;DR

This paper presents data-centric techniques and a curated dataset to improve reward modeling in large language models, resulting in top-performing models on the RewardBench leaderboard.

Contribution

It introduces effective data selection and filtering strategies, creating a high-quality, smaller preference dataset and new reward models that outperform existing benchmarks.

Findings

01

Skywork-Reward-Gemma-27B tops RewardBench leaderboard

02

Curated dataset contains only 80K preference pairs

03

Techniques improve performance of top-ranked models

Abstract

In this report, we introduce a collection of methods to enhance reward modeling for LLMs, focusing specifically on data-centric techniques. We propose effective data selection and filtering strategies for curating high-quality open-source preference datasets, culminating in the Skywork-Reward data collection, which contains only 80K preference pairs -- significantly smaller than existing datasets. Using this curated dataset, we developed the Skywork-Reward model series -- Skywork-Reward-Gemma-27B and Skywork-Reward-Llama-3.1-8B -- with the former currently holding the top position on the RewardBench leaderboard. Notably, our techniques and datasets have directly enhanced the performance of many top-ranked models on RewardBench, highlighting the practical impact of our contributions in real-world preference learning applications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBusiness Process Modeling and Analysis