Teaching language models to support answers with verified quotes

Jacob Menick; Maja Trebacz; Vladimir Mikulik; John Aslanides; Francis; Song; Martin Chadwick; Mia Glaese; Susannah Young; Lucy Campbell-Gillingham,; Geoffrey Irving; Nat McAleese

arXiv:2203.11147·cs.CL·March 22, 2022·53 cites

Teaching language models to support answers with verified quotes

Jacob Menick, Maja Trebacz, Vladimir Mikulik, John Aslanides, Francis, Song, Martin Chadwick, Mia Glaese, Susannah Young, Lucy Campbell-Gillingham,, Geoffrey Irving, Nat McAleese

PDF

Open Access

TL;DR

This paper introduces GopherCite, a large language model trained with reinforcement learning to generate factually supported answers with citations, improving trustworthiness and allowing abstention on uncertain questions.

Contribution

It presents a novel training method combining reinforcement learning and evidence citation for large language models to enhance answer reliability.

Findings

01

GopherCite achieves 80% high-quality answers on NaturalQuestions.

02

It attains 67% quality on ELI5, improving to 90% and 80% with abstention.

03

Citation alone does not guarantee truthfulness, highlighting safety challenges.

Abstract

Recent large language models often answer factual questions correctly. But users can't trust any given claim a model makes without fact-checking, because language models can hallucinate convincing nonsense. In this work we use reinforcement learning from human preferences (RLHP) to train "open-book" QA models that generate answers whilst also citing specific evidence for their claims, which aids in the appraisal of correctness. Supporting evidence is drawn from multiple documents found via a search engine, or from a single user-provided document. Our 280 billion parameter model, GopherCite, is able to produce answers with high quality supporting evidence and abstain from answering when unsure. We measure the performance of GopherCite by conducting human evaluation of answers to questions in a subset of the NaturalQuestions and ELI5 datasets. The model's response is found to be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Natural Language Processing Techniques