Enabling Language Models to Implicitly Learn Self-Improvement

Ziqi Wang; Le Hou; Tianjian Lu; Yuexin Wu; Yunxuan Li; Hongkun Yu,; Heng Ji

arXiv:2310.00898·cs.CL·September 16, 2024·1 cites

Enabling Language Models to Implicitly Learn Self-Improvement

Ziqi Wang, Le Hou, Tianjian Lu, Yuexin Wu, Yunxuan Li, Hongkun Yu,, Heng Ji

PDF

Open Access

TL;DR

This paper introduces PIT, a framework that enables large language models to implicitly learn self-improvement goals from human preference data, reducing manual effort and enhancing response quality.

Contribution

PIT reformulates reinforcement learning from human feedback to implicitly learn improvement goals without explicit rubrics, outperforming prompting-based methods.

Findings

01

PIT significantly outperforms prompting-based methods on multiple datasets.

02

It effectively learns improvement goals from preference data without manual rubric design.

03

The approach reduces human annotation efforts in training LLMs.

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities in open-ended text generation tasks. However, the inherent open-ended nature of these tasks implies that there is always room for improvement in the quality of model responses. To address this challenge, various approaches have been proposed to enhance the performance of LLMs. There has been a growing focus on enabling LLMs to self-improve their response quality, thereby reducing the reliance on extensive human annotation efforts for collecting diverse and high-quality training data. Recently, prompting-based methods have been widely explored among self-improvement methods owing to their effectiveness, efficiency, and convenience. However, those methods usually require explicitly and thoroughly written rubrics as inputs to LLMs. It is expensive and challenging to manually derive and provide all necessary rubrics with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research

MethodsFocus