Learning to Synthesize Data for Semantic Parsing
Bailin Wang, Wenpeng Yin, Xi Victoria Lin, Caiming Xiong

TL;DR
This paper introduces a generative model combining a simple PCFG and a pre-trained BART to synthesize diverse data for semantic parsing, improving generalization in text-to-SQL tasks.
Contribution
It presents a novel, efficient data synthesis approach that leverages a non-neural PCFG and BART, enabling better exploration of unseen programs and enhancing parser performance.
Findings
Synthesized data improves semantic parser accuracy.
Model enhances domain and compositional generalization.
Efficient learning from existing data is achieved.
Abstract
Synthesizing data for semantic parsing has gained increasing attention recently. However, most methods require handcrafted (high-precision) rules in their generative process, hindering the exploration of diverse unseen data. In this work, we propose a generative model which features a (non-neural) PCFG that models the composition of programs (e.g., SQL), and a BART-based translation model that maps a program to an utterance. Due to the simplicity of PCFG and pre-trained BART, our generative model can be efficiently learned from existing data at hand. Moreover, explicitly modeling compositions using PCFG leads to a better exploration of unseen programs, thus generate more diverse data. We evaluate our method in both in-domain and out-of-domain settings of text-to-SQL parsing on the standard benchmarks of GeoQuery and Spider, respectively. Our empirical results show that the synthesized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Web Data Mining and Analysis
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Dropout · Layer Normalization · Byte Pair Encoding · Residual Connection · Dense Connections · Refunds@Expedia|||How do I get a full refund from Expedia?
