RRHF: Rank Responses to Align Language Models with Human Feedback without tears
Zheng Yuan, Hongyi Yuan, Chuanqi Tan, Wei Wang, Songfang Huang, Fei, Huang

TL;DR
RRHF introduces a simplified, efficient method for aligning large language models with human preferences by ranking responses based on conditional probabilities, reducing complexity and hyperparameter sensitivity compared to PPO.
Contribution
The paper proposes RRHF, a novel ranking-based learning paradigm that simplifies and improves the process of aligning language models with human feedback, requiring fewer models and less hyperparameter tuning.
Findings
RRHF achieves comparable performance to PPO in alignment tasks.
RRHF requires only 1-2 models during tuning, simplifying implementation.
Performance depends heavily on sampling quality, indicating a best-of-n learning approach.
Abstract
Reinforcement Learning from Human Feedback (RLHF) facilitates the alignment of large language models with human preferences, significantly enhancing the quality of interactions between humans and models. InstructGPT implements RLHF through several stages, including Supervised Fine-Tuning (SFT), reward model training, and Proximal Policy Optimization (PPO). However, PPO is sensitive to hyperparameters and requires multiple models in its standard implementation, making it hard to train and scale up to larger parameter counts. In contrast, we propose a novel learning paradigm called RRHF, which scores sampled responses from different sources via a logarithm of conditional probabilities and learns to align these probabilities with human preferences through ranking loss. RRHF can leverage sampled responses from various sources including the model responses from itself, other large language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
MethodsEntropy Regularization · Shrink and Fine-Tune · Proximal Policy Optimization · ALIGN
