Generating Synthetic Data for Task-Oriented Semantic Parsing with Hierarchical Representations
Ke Tran, Ming Tan

TL;DR
This paper presents a method to generate synthetic hierarchical semantic parsing data using a pretrained BART model, aiming to reduce the need for costly labeled data in complex conversational AI tasks.
Contribution
The authors propose a novel approach combining template extraction, fine-tuning BART, and an auxiliary parser to generate high-quality synthetic data for hierarchical semantic parsing.
Findings
Synthetic data improves parser performance on the Facebook TOP dataset.
The approach reduces reliance on manually labeled hierarchical data.
Filtered synthetic data maintains high quality for training semantic parsers.
Abstract
Modern conversational AI systems support natural language understanding for a wide variety of capabilities. While a majority of these tasks can be accomplished using a simple and flat representation of intents and slots, more sophisticated capabilities require complex hierarchical representations supported by semantic parsing. State-of-the-art semantic parsers are trained using supervised learning with data labeled according to a hierarchical schema which might be costly to obtain or not readily available for a new domain. In this work, we explore the possibility of generating synthetic data for neural semantic parsing using a pretrained denoising sequence-to-sequence model (i.e., BART). Specifically, we first extract masked templates from the existing labeled utterances, and then fine-tune BART to generate synthetic utterances conditioning on the extracted templates. Finally, we use an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
MethodsLinear Layer · Multi-Head Attention · Layer Normalization · Refunds@Expedia|||How do I get a full refund from Expedia? · Byte Pair Encoding · Softmax · Dense Connections · Residual Connection · Adam · Dropout
