TL;DR
This paper demonstrates how counterfactual learning from logged human feedback can significantly improve neural semantic parsers, addressing challenges in estimator reweighting and enabling practical human-in-the-loop training.
Contribution
It introduces a novel application of counterfactual learning to neural semantic parsing and proposes solutions for estimator reweighting to prevent degeneracies.
Findings
Semantic parsers improved significantly using human feedback
Developed an easy-to-use interface for collecting human feedback
Addressed estimator reweighting challenges in counterfactual learning
Abstract
Counterfactual learning from human bandit feedback describes a scenario where user feedback on the quality of outputs of a historic system is logged and used to improve a target system. We show how to apply this learning framework to neural semantic parsing. From a machine learning perspective, the key challenge lies in a proper reweighting of the estimator so as to avoid known degeneracies in counterfactual learning, while still being applicable to stochastic gradient optimization. To conduct experiments with human users, we devise an easy-to-use interface to collect human feedback on semantic parses. Our work is the first to show that semantic parsers can be improved significantly by counterfactual learning from logged human feedback data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
