Data-Efficient Alignment of Large Language Models with Human Feedback Through Natural Language
Di Jin, Shikib Mehri, Devamanyu Hazarika, Aishwarya Padmakumar,, Sungjin Lee, Yang Liu, Mahdi Namazifar

TL;DR
This paper demonstrates that fine-tuning open-source large language models with a small amount of natural language human feedback enables significant improvements in response quality, rivaling top commercial models.
Contribution
It introduces a data-efficient method for aligning LLMs using natural language feedback, requiring only a small dataset of critiques and revisions.
Findings
Fine-tuning with natural language feedback improves LLM responses.
Revised responses outperform original ones with up to 65.9% win rate.
Method achieves competitive results with minimal human feedback data.
Abstract
Learning from human feedback is a prominent technique to align the output of large language models (LLMs) with human expectations. Reinforcement learning from human feedback (RLHF) leverages human preference signals that are in the form of ranking of response pairs to perform this alignment. However, human preference on LLM outputs can come in much richer forms including natural language, which may provide detailed feedback on strengths and weaknesses of a given response. In this work we investigate data efficiency of modeling human feedback that is in natural language. Specifically, we fine-tune an open-source LLM, e.g., Falcon-40B-Instruct, on a relatively small amount (1000 records or even less) of human feedback in natural language in the form of critiques and revisions of responses. We show that this model is able to improve the quality of responses from even some of the strongest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Software Engineering Research · Natural Language Processing Techniques
MethodsALIGN
