Improving a Neural Semantic Parser by Counterfactual Learning from Human   Bandit Feedback

Carolin Lawrence; Stefan Riezler

arXiv:1805.01252·cs.CL·December 3, 2018

Improving a Neural Semantic Parser by Counterfactual Learning from Human Bandit Feedback

Carolin Lawrence, Stefan Riezler

PDF

1 Repo

TL;DR

This paper demonstrates how counterfactual learning from logged human feedback can significantly improve neural semantic parsers, addressing challenges in estimator reweighting and enabling practical human-in-the-loop training.

Contribution

It introduces a novel application of counterfactual learning to neural semantic parsing and proposes solutions for estimator reweighting to prevent degeneracies.

Findings

01

Semantic parsers improved significantly using human feedback

02

Developed an easy-to-use interface for collecting human feedback

03

Addressed estimator reweighting challenges in counterfactual learning

Abstract

Counterfactual learning from human bandit feedback describes a scenario where user feedback on the quality of outputs of a historic system is logged and used to improve a target system. We show how to apply this learning framework to neural semantic parsing. From a machine learning perspective, the key challenge lies in a proper reweighting of the estimator so as to avoid known degeneracies in counterfactual learning, while still being applicable to stochastic gradient optimization. To conduct experiments with human users, we devise an easy-to-use interface to collect human feedback on semantic parses. Our work is the first to show that semantic parsers can be improved significantly by counterfactual learning from logged human feedback data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

carolinlawrence/nematus
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.