Diverse Parallel Data Synthesis for Cross-Database Adaptation of Text-to-SQL Parsers
Abhijeet Awasthi, Ashutosh Sathe, Sunita Sarawagi

TL;DR
This paper introduces ReFill, a novel framework for synthesizing diverse, high-quality text queries to adapt Text-to-SQL parsers to new databases, significantly improving performance over existing data augmentation methods.
Contribution
ReFill is a new retrieval-and-edit based approach that generates diverse parallel datasets for cross-database Text-to-SQL adaptation, outperforming prior methods.
Findings
ReFill produces more diverse text queries than standard methods.
Fine-tuning with ReFill-synthesized data improves parser accuracy.
ReFill outperforms previous data augmentation techniques across multiple databases.
Abstract
Text-to-SQL parsers typically struggle with databases unseen during the train time. Adapting parsers to new databases is a challenging problem due to the lack of natural language queries in the new schemas. We present ReFill, a framework for synthesizing high-quality and textually diverse parallel datasets for adapting a Text-to-SQL parser to a target schema. ReFill learns to retrieve-and-edit text queries from the existing schemas and transfers them to the target schema. We show that retrieving diverse existing text, masking their schema-specific tokens, and refilling with tokens relevant to the target schema, leads to significantly more diverse text queries than achievable by standard SQL-to-Text generation methods. Through experiments spanning multiple databases, we demonstrate that fine-tuning parsers on datasets synthesized using ReFill consistently outperforms the prior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Web Data Mining and Analysis · Advanced Database Systems and Queries
