Self-Evolution Fine-Tuning for Policy Optimization

Ruijun Chen; Jiehao Liang; Shiping Gao; Fanqi Wan; Xiaojun Quan

arXiv:2406.10813·cs.CL·June 18, 2024

Self-Evolution Fine-Tuning for Policy Optimization

Ruijun Chen, Jiehao Liang, Shiping Gao, Fanqi Wan, Xiaojun Quan

PDF

Open Access

TL;DR

This paper introduces self-evolution fine-tuning (SEFT), a novel method for aligning large language models that uses unannotated data and an adaptive reviser to improve responses without requiring costly annotations.

Contribution

SEFT eliminates the need for annotated samples in policy optimization by using an adaptive reviser and unannotated data, improving stability and efficiency over existing methods.

Findings

01

SEFT outperforms traditional fine-tuning and RLHF on benchmarks.

02

SEFT effectively leverages unlimited unannotated data.

03

SEFT maintains high response quality with reduced annotation effort.

Abstract

The alignment of large language models (LLMs) is crucial not only for unlocking their potential in specific tasks but also for ensuring that responses meet human expectations and adhere to safety and ethical principles. Current alignment methodologies face considerable challenges. For instance, supervised fine-tuning (SFT) requires extensive, high-quality annotated samples, while reinforcement learning from human feedback (RLHF) is complex and often unstable. In this paper, we introduce self-evolution fine-tuning (SEFT) for policy optimization, with the aim of eliminating the need for annotated samples while retaining the stability and efficiency of SFT. SEFT first trains an adaptive reviser to elevate low-quality responses while maintaining high-quality ones. The reviser then gradually guides the policy's optimization by fine-tuning it with enhanced responses. One of the prominent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation Techniques and Applications

MethodsShrink and Fine-Tune