Augmenting Multi-Turn Text-to-SQL Datasets with Self-Play

Qi Liu; Zihuiwen Ye; Tao Yu; Phil Blunsom; Linfeng Song

arXiv:2210.12096·cs.CL·October 24, 2022·1 cites

Augmenting Multi-Turn Text-to-SQL Datasets with Self-Play

Qi Liu, Zihuiwen Ye, Tao Yu, Phil Blunsom, Linfeng Song

PDF

Open Access 1 Repo

TL;DR

This paper introduces a self-play data augmentation method for multi-turn text-to-SQL tasks, improving model accuracy and generalization by synthesizing new conversational interactions.

Contribution

It presents a novel self-play approach that generates synthetic multi-turn interactions to enhance training data for context-dependent text-to-SQL models.

Findings

01

Self-play improves accuracy on SParC and CoSQL datasets.

02

Synthetic interactions help models generalize across domains.

03

Enhanced beam-search performance observed with augmented data.

Abstract

The task of context-dependent text-to-SQL aims to convert multi-turn user utterances to formal SQL queries. This is a challenging task due to both the scarcity of training data from which to learn complex contextual dependencies and to generalize to unseen databases. In this paper we explore augmenting the training datasets using self-play, which leverages contextual information to synthesize new interactions to adapt the model to new databases. We first design a SQL-to-text model conditioned on a sampled goal query, which represents a user's intent, that then converses with a text-to-SQL semantic parser to generate new interactions. We then filter the synthesized interactions and retrain the models with the augmented data. We find that self-play improves the accuracy of a strong baseline on SParC and CoSQL, two widely used cross-domain text-to-SQL datasets. Our analysis shows that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leuchine/self_play_picard
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems