Addressing Resource and Privacy Constraints in Semantic Parsing Through Data Augmentation
Kevin Yang, Olivia Deng, Charles Chen, Richard Shin, Subhro Roy,, Benjamin Van Durme

TL;DR
This paper proposes a data augmentation method to improve low-resource semantic parsing by generating structured utterances and simulating natural language, achieving significant performance gains under realistic constraints.
Contribution
It introduces a novel data augmentation approach tailored for low-resource, privacy-sensitive semantic parsing scenarios with no reliance on related datasets or direct grammar sampling.
Findings
33% relative improvement in top-1 match on SMCalFlow dataset
Effective data augmentation despite restrictive real-world constraints
Demonstrates viability of structured utterance generation for low-resource parsing
Abstract
We introduce a novel setup for low-resource task-oriented semantic parsing which incorporates several constraints that may arise in real-world scenarios: (1) lack of similar datasets/models from a related domain, (2) inability to sample useful logical forms directly from a grammar, and (3) privacy requirements for unlabeled natural utterances. Our goal is to improve a low-resource semantic parser using utterances collected through user interactions. In this highly challenging but realistic setting, we investigate data augmentation approaches involving generating a set of structured canonical utterances corresponding to logical forms, before simulating corresponding natural language and filtering the resulting pairs. We find that such approaches are effective despite our restrictive setup: in a low-resource setting on the complex SMCalFlow calendaring dataset (Andreas et al., 2020), we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
