ExeSQL: Self-Taught Text-to-SQL Models with Execution-Driven Bootstrapping for SQL Dialects

Jipeng Zhang; Haolin Yang; Kehao Miao; Ruiyuan Zhang; Renjie Pi; Jiahui Gao; Xiaofang Zhou

arXiv:2505.17231·cs.CL·May 26, 2025

ExeSQL: Self-Taught Text-to-SQL Models with Execution-Driven Bootstrapping for SQL Dialects

Jipeng Zhang, Haolin Yang, Kehao Miao, Ruiyuan Zhang, Renjie Pi, Jiahui Gao, Xiaofang Zhou

PDF

1 Repo

TL;DR

ExeSQL introduces an execution-driven bootstrapping approach for self-taught text-to-SQL models, enabling effective adaptation to multiple SQL dialects through iterative, feedback-guided learning and execution-based filtering.

Contribution

The paper presents a novel framework that leverages execution feedback and agentic bootstrapping to improve multi-dialect text-to-SQL models without relying on high-quality dialect-specific datasets.

Findings

01

Achieves 15.2% improvement on PostgreSQL

02

Achieves 10.38% improvement on MySQL

03

Achieves 4.49% improvement on Oracle

Abstract

Recent text-to-SQL models have achieved strong performance, but their effectiveness remains largely confined to SQLite due to dataset limitations. However, real-world applications require SQL generation across multiple dialects with varying syntax and specialized features, which remains a challenge for current models. The main obstacle in building a dialect-aware model lies in acquiring high-quality dialect-specific data. Data generated purely through static prompting - without validating SQLs via execution - tends to be noisy and unreliable. Moreover, the lack of real execution environments in the training loop prevents models from grounding their predictions in executable semantics, limiting generalization despite surface-level improvements from data filtering. This work introduces ExeSQL, a text-to-SQL framework with execution-driven, agentic bootstrapping. The method consists of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

2003pro/exesql
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.