Learning New Skills after Deployment: Improving open-domain   internet-driven dialogue with human feedback

Jing Xu; Megan Ung; Mojtaba Komeili; Kushal Arora; Y-Lan Boureau,; Jason Weston

arXiv:2208.03270·cs.CL·August 17, 2022·6 cites

Learning New Skills after Deployment: Improving open-domain internet-driven dialogue with human feedback

Jing Xu, Megan Ung, Mojtaba Komeili, Kushal Arora, Y-Lan Boureau,, Jason Weston

PDF

Open Access

TL;DR

This paper explores how internet retrieval and human feedback during deployment can enhance open-domain dialogue models, demonstrating that the Director model significantly outperforms other methods in improving conversational skills.

Contribution

It introduces a framework for collecting deployment data and feedback, and evaluates various algorithms, highlighting the effectiveness of the Director model for online learning.

Findings

01

Director model outperforms other approaches

02

Human feedback improves dialogue quality

03

Rejection sampling and reward-based learning are effective

Abstract

Frozen models trained to mimic static datasets can never improve their performance. Models that can employ internet-retrieval for up-to-date information and obtain feedback from humans during deployment provide the promise of both adapting to new information, and improving their performance. In this work we study how to improve internet-driven conversational skills in such a learning framework. We collect deployment data, which we make publicly available, of human interactions, and collect various types of human feedback -- including binary quality measurements, free-form text feedback, and fine-grained reasons for failure. We then study various algorithms for improving from such feedback, including standard supervised learning, rejection sampling, model-guiding and reward-based learning, in order to make recommendations on which type of feedback and algorithms work best. We find the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Expert finding and Q&A systems