Just Say What You Want: Only-prompting Self-rewarding Online Preference Optimization
Ruijie Xu, Zhihan Liu, Yongfei Liu, Shipeng Yan, Zhaoran Wang, Zhi, Zhang, Xuming He

TL;DR
This paper introduces a novel online RLHF method that uses only prompts for self-rewarding, reducing reliance on judgment models, and improves model alignment by generating challenging negatives to better capture human preferences.
Contribution
The paper proposes an only-prompting self-rewarding online algorithm that generates preference data without judgment models and employs fine-grained control over training difficulty.
Findings
Achieved 34.5% win rate on AlpacaEval 2.0
Significantly improved performance of base models
Demonstrated effectiveness on Mistral-7B and Mistral-Instruct-7B
Abstract
We address the challenge of online Reinforcement Learning from Human Feedback (RLHF) with a focus on self-rewarding alignment methods. In online RLHF, obtaining feedback requires interaction with the environment, which can be costly when using additional reward models or the GPT-4 API. Current self-rewarding approaches rely heavily on the discriminator's judgment capabilities, which are effective for large-scale models but challenging to transfer to smaller ones. To address these limitations, we propose a novel, only-prompting self-rewarding online algorithm that generates preference datasets without relying on judgment capabilities. Additionally, we employ fine-grained arithmetic control over the optimality gap between positive and negative examples, generating more hard negatives in the later stages of training to help the model better capture subtle human preferences. Finally, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuction Theory and Applications · Optimization and Search Problems · Consumer Market Behavior and Pricing
MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Softmax · Layer Normalization · Dropout · Dense Connections
