Improving Reward Models with Synthetic Critiques

Zihuiwen Ye; Fraser Greenlee-Scott; Max Bartolo; Phil Blunsom; Jon; Ander Campos; Matthias Gall\'e

arXiv:2405.20850·cs.CL·October 21, 2024·1 cites

Improving Reward Models with Synthetic Critiques

Zihuiwen Ye, Fraser Greenlee-Scott, Max Bartolo, Phil Blunsom, Jon, Ander Campos, Matthias Gall\'e

PDF

Open Access 1 Video

TL;DR

This paper introduces a method to enhance reward models for language models by using synthetic critiques generated by large language models, leading to better performance, efficiency, and robustness.

Contribution

The paper proposes leveraging synthetic natural language critiques to improve reward models, reducing dependence on human annotations and enhancing generalization.

Findings

01

Synthetic critiques improve RM performance.

02

Reduced need for human-labeled data.

03

Enhanced robustness and interpretability.

Abstract

Reward models (RMs) play a critical role in aligning language models through the process of reinforcement learning from human feedback. RMs are trained to predict a score reflecting human preference, which requires significant time and cost for human annotation. Additionally, RMs tend to quickly overfit on superficial features in the training set, hindering their generalization performance on unseen distributions. We propose a novel approach using synthetic natural language critiques generated by large language models to provide additional feedback, evaluating aspects such as instruction following, correctness, and style. This offers richer signals and more robust features for RMs to assess and score on. We demonstrate that high-quality critiques improve the performance and data efficiency of RMs initialized from different pretrained models, reducing the reliance on costly human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Improving Reward Models with Synthetic Critiques· underline

Taxonomy

TopicsDiverse Scientific and Economic Studies