Training language models to follow instructions with human feedback
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright,, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John, Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda, Askell, Peter Welinder, Paul Christiano, Jan Leike

TL;DR
This paper demonstrates that fine-tuning language models with human feedback significantly improves their alignment with user intent, truthfulness, and safety, even with smaller models.
Contribution
It introduces a method for aligning language models with human preferences using supervised fine-tuning and reinforcement learning from human feedback, resulting in the InstructGPT models.
Findings
InstructGPT outperforms larger GPT-3 models in human preference tests.
Fine-tuning with human feedback improves truthfulness and reduces toxicity.
Smaller models can match or exceed larger models' performance through this method.
Abstract
Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-3 using supervised learning. We then collect a dataset of rankings of model outputs, which we use to further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT. In human evaluations on our prompt…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗TurkuNLP/gpt3-finnish-smallmodel· 1.2k dl· ♡ 131.2k dl♡ 13
- 🤗TurkuNLP/gpt3-finnish-mediummodel· 12 dl12 dl
- 🤗TurkuNLP/gpt3-finnish-largemodel· 905 dl· ♡ 9905 dl♡ 9
- 🤗TurkuNLP/gpt3-finnish-xlmodel· 354 dl· ♡ 8354 dl♡ 8
- 🤗TurkuNLP/gpt3-finnish-3Bmodel· 47 dl· ♡ 247 dl♡ 2
- 🤗TurkuNLP/gpt3-finnish-8Bmodel· 8 dl· ♡ 28 dl♡ 2
- 🤗TurkuNLP/gpt3-finnish-13Bmodel· 852 dl· ♡ 14852 dl♡ 14
- 🤗rinna/japanese-gpt-neox-3.6b-instruction-ppomodel· 836 dl· ♡ 74836 dl♡ 74
- 🤗rinna/bilingual-gpt-neox-4b-instruction-ppomodel· 15 dl· ♡ 1415 dl♡ 14
- 🤗DS-Archive/no-robots-y34b-loramodel· 4 dl· ♡ 54 dl♡ 5
Videos
Reinforcement Learning with Human Feedback (RLHF), Clearly Explained!!!· youtube
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Cosine Annealing · Attention Dropout · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Cosine Annealing · Dense Connections · Residual Connection
