RRHF: Rank Responses to Align Language Models with Human Feedback   without tears

Zheng Yuan; Hongyi Yuan; Chuanqi Tan; Wei Wang; Songfang Huang; Fei; Huang

arXiv:2304.05302·cs.CL·October 10, 2023·37 cites

RRHF: Rank Responses to Align Language Models with Human Feedback without tears

Zheng Yuan, Hongyi Yuan, Chuanqi Tan, Wei Wang, Songfang Huang, Fei, Huang

PDF

Open Access 1 Repo 2 Models

TL;DR

RRHF introduces a simplified, efficient method for aligning large language models with human preferences by ranking responses based on conditional probabilities, reducing complexity and hyperparameter sensitivity compared to PPO.

Contribution

The paper proposes RRHF, a novel ranking-based learning paradigm that simplifies and improves the process of aligning language models with human feedback, requiring fewer models and less hyperparameter tuning.

Findings

01

RRHF achieves comparable performance to PPO in alignment tasks.

02

RRHF requires only 1-2 models during tuning, simplifying implementation.

03

Performance depends heavily on sampling quality, indicating a best-of-n learning approach.

Abstract

Reinforcement Learning from Human Feedback (RLHF) facilitates the alignment of large language models with human preferences, significantly enhancing the quality of interactions between humans and models. InstructGPT implements RLHF through several stages, including Supervised Fine-Tuning (SFT), reward model training, and Proximal Policy Optimization (PPO). However, PPO is sensitive to hyperparameters and requires multiple models in its standard implementation, making it hard to train and scale up to larger parameter counts. In contrast, we propose a novel learning paradigm called RRHF, which scores sampled responses from different sources via a logarithm of conditional probabilities and learns to align these probabilities with human preferences through ranking loss. RRHF can leverage sampled responses from various sources including the model responses from itself, other large language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ganjinzero/rrhf
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques

MethodsEntropy Regularization · Shrink and Fine-Tune · Proximal Policy Optimization · ALIGN