Learning to summarize from human feedback
Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M. Ziegler, Ryan Lowe,, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano

TL;DR
This paper demonstrates that training language models with human preference data and reinforcement learning significantly improves summarization quality over traditional metrics and supervised methods.
Contribution
It introduces a method to optimize summarization models using a learned reward function based on human preferences, surpassing previous approaches in quality.
Findings
Models outperform human reference summaries and larger supervised models.
Summaries transfer well to new datasets like CNN/DM.
Optimizing for human preferences yields better results than ROUGE-based metrics.
Abstract
As language models become more powerful, training and evaluation are increasingly bottlenecked by the data and metrics used for a particular task. For example, summarization models are often trained to predict human reference summaries and evaluated using ROUGE, but both of these metrics are rough proxies for what we really care about -- summary quality. In this work, we show that it is possible to significantly improve summary quality by training a model to optimize for human preferences. We collect a large, high-quality dataset of human comparisons between summaries, train a model to predict the human-preferred summary, and use that model as a reward function to fine-tune a summarization policy using reinforcement learning. We apply our method to a version of the TL;DR dataset of Reddit posts and find that our models significantly outperform both human reference summaries and much…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Learning to summarize from human feedback (Paper Explained)· youtube
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
