TL;DR
This paper explores using simulated user feedback within a bandit learning framework to improve extractive question answering systems, enabling domain adaptation and reducing annotation needs.
Contribution
It introduces a bandit learning approach for extractive QA that leverages simulated user feedback to enhance performance and domain transfer without additional annotation.
Findings
Systems improve significantly with user feedback.
Domain adaptation is possible without new annotations.
Feedback-driven learning reduces data annotation efforts.
Abstract
We study learning from user feedback for extractive question answering by simulating feedback using supervised data. We cast the problem as contextual bandit learning, and analyze the characteristics of several learning scenarios with focus on reducing data annotation. We show that systems initially trained on a small number of examples can dramatically improve given feedback from users on model-predicted answers, and that one can use existing datasets to deploy systems in new domains without any annotation, but instead improving the system on-the-fly via user feedback.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
