Self-critiquing models for assisting human evaluators

William Saunders; Catherine Yeh; Jeff Wu; Steven Bills; Long Ouyang,; Jonathan Ward; Jan Leike

arXiv:2206.05802·cs.CL·June 15, 2022·46 cites

Self-critiquing models for assisting human evaluators

William Saunders, Catherine Yeh, Jeff Wu, Steven Bills, Long Ouyang,, Jonathan Ward, Jan Leike

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that fine-tuned large language models can generate useful critiques to assist human evaluators, improving the detection of flaws in summaries and enabling self-improvement of their outputs, with larger models performing better.

Contribution

It introduces a method for fine-tuning models to produce critiques that aid human evaluation and explores the scaling properties and self-critique capabilities of these models.

Findings

01

Larger models generate more helpful critiques.

02

Models improve their summaries by integrating self-critiques.

03

Critiquing ability correlates with, but is distinct from, generation and discrimination abilities.

Abstract

We fine-tune large language models to write natural language critiques (natural language critical comments) using behavioral cloning. On a topic-based summarization task, critiques written by our models help humans find flaws in summaries that they would have otherwise missed. Our models help find naturally occurring flaws in both model and human written summaries, and intentional flaws in summaries written by humans to be deliberately misleading. We study scaling properties of critiquing with both topic-based summarization and synthetic tasks. Larger models write more helpful critiques, and on most tasks, are better at self-critiquing, despite having harder-to-critique outputs. Larger models can also integrate their own self-critiques as feedback, refining their own summaries into better ones. Finally, we motivate and introduce a framework for comparing critiquing ability to generation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

feyzaakyurek/rl4f
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Software Engineering Research · Natural Language Processing Techniques