Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness
Jian Li, Haojing Huang, Yujia Zhang, Pengfei Xu, Xi Chen, Rui Song,, Lida Shi, Jingwen Wang, Hao Xu

TL;DR
This paper introduces a Self-supervised Preference Optimization framework that enhances Large Language Models' ability to understand preference degrees, leading to improved performance over existing methods in preference-based training.
Contribution
The paper proposes a novel self-supervised preference degree loss that, when combined with alignment loss, improves LLMs' understanding of human preference intensities.
Findings
SPO significantly boosts preference optimization performance.
SPO achieves state-of-the-art results on multiple datasets.
The framework is compatible with existing preference methods.
Abstract
Recently, there has been significant interest in replacing the reward model in Reinforcement Learning with Human Feedback (RLHF) methods for Large Language Models (LLMs), such as Direct Preference Optimization (DPO) and its variants. These approaches commonly use a binary cross-entropy mechanism on pairwise samples, i.e., minimizing and maximizing the loss based on preferred or dis-preferred responses, respectively. However, while this training strategy omits the reward model, it also overlooks the varying preference degrees within different responses. We hypothesize that this is a key factor hindering LLMs from sufficiently understanding human preferences. To address this problem, we propose a novel Self-supervised Preference Optimization (SPO) framework, which constructs a self-supervised preference degree loss combined with the alignment loss, thereby helping LLMs improve their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Advanced Text Analysis Techniques
