Simulating Bandit Learning from User Feedback for Extractive Question   Answering

Ge Gao; Eunsol Choi; Yoav Artzi

arXiv:2203.10079·cs.CL·March 21, 2022

Simulating Bandit Learning from User Feedback for Extractive Question Answering

Ge Gao, Eunsol Choi, Yoav Artzi

PDF

1 Repo

TL;DR

This paper explores using simulated user feedback within a bandit learning framework to improve extractive question answering systems, enabling domain adaptation and reducing annotation needs.

Contribution

It introduces a bandit learning approach for extractive QA that leverages simulated user feedback to enhance performance and domain transfer without additional annotation.

Findings

01

Systems improve significantly with user feedback.

02

Domain adaptation is possible without new annotations.

03

Feedback-driven learning reduces data annotation efforts.

Abstract

We study learning from user feedback for extractive question answering by simulating feedback using supervised data. We cast the problem as contextual bandit learning, and analyze the characteristics of several learning scenarios with focus on reducing data annotation. We show that systems initially trained on a small number of examples can dramatically improve given feedback from users on model-predicted answers, and that one can use existing datasets to deploy systems in new domains without any annotation, but instead improving the system on-the-fly via user feedback.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lil-lab/bandit-qa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.