Online Self-Preferring Language Models

Yuanzhao Zhai; Zhuo Zhang; Kele Xu; Hanyang Peng; Yue Yu; Dawei Feng,; Cheng Yang; Bo Ding; Huaimin Wang

arXiv:2405.14103·cs.LG·May 24, 2024

Online Self-Preferring Language Models

Yuanzhao Zhai, Zhuo Zhang, Kele Xu, Hanyang Peng, Yue Yu, Dawei Feng,, Cheng Yang, Bo Ding, Huaimin Wang

PDF

Open Access

TL;DR

This paper introduces Online Self-Preferring (OSP) language models that learn from self-generated response pairs and preference strengths, improving alignment, robustness, and self-improvement without external supervision.

Contribution

The paper proposes a novel OSP method that explicitly models preference strength, leading to better alignment and robustness compared to existing offline and online methods.

Findings

01

OSP achieves state-of-the-art alignment performance.

02

Leveraging preference strength prevents overfitting.

03

OSP models can self-improve without external supervision.

Abstract

Aligning with human preference datasets has been critical to the success of large language models (LLMs). Reinforcement learning from human feedback (RLHF) employs a costly reward model to provide feedback for on-policy sampling responses. Recently, offline methods that directly fit responses with binary preferences in the dataset have emerged as alternatives. However, existing methods do not explicitly model preference strength information, which is crucial for distinguishing different response pairs. To overcome this limitation, we propose Online Self-Preferring (OSP) language models to learn from self-generated response pairs and self-judged preference strengths. For each prompt and corresponding self-generated responses, we introduce a ranked pairing method to construct multiple response pairs with preference strength information. We then propose the soft-preference cross-entropy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling