TL;DR
This paper introduces feedback-weighted learning using importance sampling to enhance conversational question answering systems post-deployment through binary user feedback, showing promising results in both in-domain and out-of-domain scenarios.
Contribution
It presents a novel method for improving conversational QA systems after deployment by leveraging binary user feedback with importance sampling.
Findings
Method improves over initial supervised systems
Achieves performance close to fully-supervised models in-domain
Matches fully-supervised performance out-of-domain
Abstract
The interaction of conversational systems with users poses an exciting opportunity for improving them after deployment, but little evidence has been provided of its feasibility. In most applications, users are not able to provide the correct answer to the system, but they are able to provide binary (correct, incorrect) feedback. In this paper we propose feedback-weighted learning based on importance sampling to improve upon an initial supervised system using binary user feedback. We perform simulated experiments on document classification (for development) and Conversational Question Answering datasets like QuAC and DoQA, where binary user feedback is derived from gold annotations. The results show that our method is able to improve over the initial supervised system, getting close to a fully-supervised system that has access to the same labeled examples in in-domain experiments (QuAC),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
