Don't paraphrase, detect! Rapid and Effective Data Collection for   Semantic Parsing

Jonathan Herzig; Jonathan Berant

arXiv:1908.09940·cs.CL·August 30, 2019

Don't paraphrase, detect! Rapid and Effective Data Collection for Semantic Parsing

Jonathan Herzig, Jonathan Berant

PDF

1 Repo

TL;DR

This paper introduces a new data collection method for semantic parsing that combines crowdsourcing with a paraphrase model, significantly improving accuracy by addressing distribution mismatches in data collection.

Contribution

It identifies key distribution mismatches in existing data collection methods and proposes a novel approach that leverages unlabeled data and paraphrasing to enhance semantic parsing accuracy.

Findings

01

Achieved 70.6% accuracy on true data distribution.

02

Outperformed traditional paraphrasing-based methods with 51.3% accuracy.

03

Effectively mitigated distribution mismatch issues.

Abstract

A major hurdle on the road to conversational interfaces is the difficulty in collecting data that maps language utterances to logical forms. One prominent approach for data collection has been to automatically generate pseudo-language paired with logical forms, and paraphrase the pseudo-language to natural language through crowdsourcing (Wang et al., 2015). However, this data collection procedure often leads to low performance on real data, due to a mismatch between the true distribution of examples and the distribution induced by the data collection procedure. In this paper, we thoroughly analyze two sources of mismatch in this process: the mismatch in logical form distribution and the mismatch in language distribution between the true and induced distributions. We quantify the effects of these mismatches, and propose a new data collection approach that mitigates them. Assuming access…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jonathanherzig/semantic-parsing-annotation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.