Self-Boosting Large Language Models with Synthetic Preference Data

Qingxiu Dong; Li Dong; Xingxing Zhang; Zhifang Sui; Furu Wei

arXiv:2410.06961·cs.CL·October 10, 2024

Self-Boosting Large Language Models with Synthetic Preference Data

Qingxiu Dong, Li Dong, Xingxing Zhang, Zhifang Sui, Furu Wei

PDF

Open Access 1 Models

TL;DR

SynPO introduces a self-boosting method for LLMs that uses synthetic preference data and iterative self-improvement, reducing reliance on costly human annotations and enhancing model performance across multiple benchmarks.

Contribution

The paper presents SynPO, a novel self-boosting paradigm that leverages synthetic data and iterative self-prompting to improve LLM alignment without extensive human preference annotations.

Findings

01

Significant performance improvements on AlpacaEval 2.0 and ArenaHard benchmarks.

02

Over 22.1% win rate increase after four SynPO iterations.

03

Enhanced general performance on the Open LLM leaderboard.

Abstract

Through alignment with human preferences, Large Language Models (LLMs) have advanced significantly in generating honest, harmless, and helpful responses. However, collecting high-quality preference data is a resource-intensive and creativity-demanding process, especially for the continual improvement of LLMs. We introduce SynPO, a self-boosting paradigm that leverages synthetic preference data for model alignment. SynPO employs an iterative mechanism wherein a self-prompt generator creates diverse prompts, and a response improver refines model responses progressively. This approach trains LLMs to autonomously learn the generative rewards for their own outputs and eliminates the need for large-scale annotation of prompts and human preferences. After four SynPO iterations, Llama3-8B and Mistral-7B show significant enhancements in instruction-following abilities, achieving over 22.1% win…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
chelleboyer/llm-mm-good-eb8e3f60-56f2-4729-8934-2428ca568d27
model· 1 dl
1 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems