Text-to-SQL Domain Adaptation via Human-LLM Collaborative Data Annotation

Yuan Tian; Daniel Lee; Fei Wu; Tung Mai; Kun Qian; Siddhartha Sahai; Tianyi Zhang; Yunyao Li

arXiv:2502.15980·cs.HC·November 17, 2025

Text-to-SQL Domain Adaptation via Human-LLM Collaborative Data Annotation

Yuan Tian, Daniel Lee, Fei Wu, Tung Mai, Kun Qian, Siddhartha Sahai, Tianyi Zhang, Yunyao Li

PDF

TL;DR

This paper introduces SQLsynth, a human-LLM collaborative system that efficiently generates high-quality, diverse text-to-SQL datasets, addressing domain adaptation challenges in real-world applications.

Contribution

SQLsynth is a novel human-in-the-loop annotation system that accelerates data creation and improves quality for domain-specific text-to-SQL models.

Findings

01

SQLsynth significantly speeds up data annotation.

02

It reduces cognitive load for annotators.

03

Datasets produced are more accurate, natural, and diverse.

Abstract

Text-to-SQL models, which parse natural language (NL) questions to executable SQL queries, are increasingly adopted in real-world applications. However, deploying such models in the real world often requires adapting them to the highly specialized database schemas used in specific applications. We find that existing text-to-SQL models experience significant performance drops when applied to new schemas, primarily due to the lack of domain-specific data for fine-tuning. This data scarcity also limits the ability to effectively evaluate model performance in new domains. Continuously obtaining high-quality text-to-SQL data for evolving schemas is prohibitively expensive in real-world scenarios. To bridge this gap, we propose SQLsynth, a human-in-the-loop text-to-SQL data annotation system. SQLsynth streamlines the creation of high-quality text-to-SQL datasets through human-LLM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.