Augmenting Multi-Turn Text-to-SQL Datasets with Self-Play
Qi Liu, Zihuiwen Ye, Tao Yu, Phil Blunsom, Linfeng Song

TL;DR
This paper introduces a self-play data augmentation method for multi-turn text-to-SQL tasks, improving model accuracy and generalization by synthesizing new conversational interactions.
Contribution
It presents a novel self-play approach that generates synthetic multi-turn interactions to enhance training data for context-dependent text-to-SQL models.
Findings
Self-play improves accuracy on SParC and CoSQL datasets.
Synthetic interactions help models generalize across domains.
Enhanced beam-search performance observed with augmented data.
Abstract
The task of context-dependent text-to-SQL aims to convert multi-turn user utterances to formal SQL queries. This is a challenging task due to both the scarcity of training data from which to learn complex contextual dependencies and to generalize to unseen databases. In this paper we explore augmenting the training datasets using self-play, which leverages contextual information to synthesize new interactions to adapt the model to new databases. We first design a SQL-to-text model conditioned on a sampled goal query, which represents a user's intent, that then converses with a text-to-SQL semantic parser to generate new interactions. We then filter the synthesized interactions and retrain the models with the augmented data. We find that self-play improves the accuracy of a strong baseline on SParC and CoSQL, two widely used cross-domain text-to-SQL datasets. Our analysis shows that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
