Self-Boosting Large Language Models with Synthetic Preference Data
Qingxiu Dong, Li Dong, Xingxing Zhang, Zhifang Sui, Furu Wei

TL;DR
SynPO introduces a self-boosting method for LLMs that uses synthetic preference data and iterative self-improvement, reducing reliance on costly human annotations and enhancing model performance across multiple benchmarks.
Contribution
The paper presents SynPO, a novel self-boosting paradigm that leverages synthetic data and iterative self-prompting to improve LLM alignment without extensive human preference annotations.
Findings
Significant performance improvements on AlpacaEval 2.0 and ArenaHard benchmarks.
Over 22.1% win rate increase after four SynPO iterations.
Enhanced general performance on the Open LLM leaderboard.
Abstract
Through alignment with human preferences, Large Language Models (LLMs) have advanced significantly in generating honest, harmless, and helpful responses. However, collecting high-quality preference data is a resource-intensive and creativity-demanding process, especially for the continual improvement of LLMs. We introduce SynPO, a self-boosting paradigm that leverages synthetic preference data for model alignment. SynPO employs an iterative mechanism wherein a self-prompt generator creates diverse prompts, and a response improver refines model responses progressively. This approach trains LLMs to autonomously learn the generative rewards for their own outputs and eliminates the need for large-scale annotation of prompts and human preferences. After four SynPO iterations, Llama3-8B and Mistral-7B show significant enhancements in instruction-following abilities, achieving over 22.1% win…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
