Data-Efficient Alignment of Large Language Models with Human Feedback   Through Natural Language

Di Jin; Shikib Mehri; Devamanyu Hazarika; Aishwarya Padmakumar,; Sungjin Lee; Yang Liu; Mahdi Namazifar

arXiv:2311.14543·cs.CL·November 27, 2023·2 cites

Data-Efficient Alignment of Large Language Models with Human Feedback Through Natural Language

Di Jin, Shikib Mehri, Devamanyu Hazarika, Aishwarya Padmakumar,, Sungjin Lee, Yang Liu, Mahdi Namazifar

PDF

Open Access

TL;DR

This paper demonstrates that fine-tuning open-source large language models with a small amount of natural language human feedback enables significant improvements in response quality, rivaling top commercial models.

Contribution

It introduces a data-efficient method for aligning LLMs using natural language feedback, requiring only a small dataset of critiques and revisions.

Findings

01

Fine-tuning with natural language feedback improves LLM responses.

02

Revised responses outperform original ones with up to 65.9% win rate.

03

Method achieves competitive results with minimal human feedback data.

Abstract

Learning from human feedback is a prominent technique to align the output of large language models (LLMs) with human expectations. Reinforcement learning from human feedback (RLHF) leverages human preference signals that are in the form of ranking of response pairs to perform this alignment. However, human preference on LLM outputs can come in much richer forms including natural language, which may provide detailed feedback on strengths and weaknesses of a given response. In this work we investigate data efficiency of modeling human feedback that is in natural language. Specifically, we fine-tune an open-source LLM, e.g., Falcon-40B-Instruct, on a relatively small amount (1000 records or even less) of human feedback in natural language in the form of critiques and revisions of responses. We show that this model is able to improve the quality of responses from even some of the strongest…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Software Engineering Research · Natural Language Processing Techniques

MethodsALIGN