Learning New Skills after Deployment: Improving open-domain internet-driven dialogue with human feedback
Jing Xu, Megan Ung, Mojtaba Komeili, Kushal Arora, Y-Lan Boureau,, Jason Weston

TL;DR
This paper explores how internet retrieval and human feedback during deployment can enhance open-domain dialogue models, demonstrating that the Director model significantly outperforms other methods in improving conversational skills.
Contribution
It introduces a framework for collecting deployment data and feedback, and evaluates various algorithms, highlighting the effectiveness of the Director model for online learning.
Findings
Director model outperforms other approaches
Human feedback improves dialogue quality
Rejection sampling and reward-based learning are effective
Abstract
Frozen models trained to mimic static datasets can never improve their performance. Models that can employ internet-retrieval for up-to-date information and obtain feedback from humans during deployment provide the promise of both adapting to new information, and improving their performance. In this work we study how to improve internet-driven conversational skills in such a learning framework. We collect deployment data, which we make publicly available, of human interactions, and collect various types of human feedback -- including binary quality measurements, free-form text feedback, and fine-grained reasons for failure. We then study various algorithms for improving from such feedback, including standard supervised learning, rejection sampling, model-guiding and reward-based learning, in order to make recommendations on which type of feedback and algorithms work best. We find the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Expert finding and Q&A systems
