Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese, Nat McAleese, Maja Tr\k{e}bacz, John Aslanides, Vlad, Firoiu, Timo Ewalds, Maribeth Rauh, Laura Weidinger, Martin Chadwick, Phoebe, Thacker, Lucy Campbell-Gillingham, Jonathan Uesato, Po-Sen Huang, Ramona, Comanescu, Fan Yang, Abigail See, Sumanth Dathathri

TL;DR
This paper introduces Sparrow, a dialogue agent trained with human feedback and rule-based guidance to be more helpful, correct, and harmless, demonstrating improved factual accuracy and resilience to adversarial probing.
Contribution
The paper presents a novel approach combining rule-based guidance and evidence provision in reinforcement learning from human feedback for dialogue agents.
Findings
Sparrow provides evidence supporting factual claims 78% of the time.
Sparrow is preferred over baselines in human evaluations.
Sparrow violates rules only 8% of the time under adversarial probing.
Abstract
We present Sparrow, an information-seeking dialogue agent trained to be more helpful, correct, and harmless compared to prompted language model baselines. We use reinforcement learning from human feedback to train our models with two new additions to help human raters judge agent behaviour. First, to make our agent more helpful and harmless, we break down the requirements for good dialogue into natural language rules the agent should follow, and ask raters about each rule separately. We demonstrate that this breakdown enables us to collect more targeted human judgements of agent behaviour and allows for more efficient rule-conditional reward models. Second, our agent provides evidence from sources supporting factual claims when collecting preference judgements over model statements. For factual questions, evidence provided by Sparrow supports the sampled response 78% of the time.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
ChatGPT vs Sparrow - Battle of Chatbots· youtube
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
