SemanticAgent: A Semantics-Aware Framework for Text-to-SQL Data Synthesis
Qiang Gao, Zhenping Li, Anqi Zhuo, Yingxiao Zhao, Weibo Geng, Xiaosong Li

TL;DR
SemanticAgent is a new framework for Text-to-SQL data synthesis that emphasizes semantic correctness, improving data quality and downstream performance by integrating analysis, synthesis, and verification stages.
Contribution
It introduces a semantic-aware synthesis framework with specialized modules and a three-stage protocol to enhance semantic validity in generated SQL queries.
Findings
SemanticAgent outperforms prior methods in semantic-quality evaluation.
Generated data leads to stronger downstream fine-tuning performance.
Framework effectively addresses limitations of execution-based validation.
Abstract
Existing text-to-SQL synthesis pipelines still conflate executability with semantic validity: syntactic checks and execution-based validation can retain queries that execute successfully while violating database semantics. To address these limitations, we propose SemanticAgent, a semantic-aware synthesis framework. SemanticAgent organizes synthesis around three specialized modules: an analyzer, a synthesizer, and a verifier. Through a three-stage protocol of semantic analysis, stepwise synthesis, and diagnostic refinement, SemanticAgent transforms execution-based validation alone into a traceable reasoning process. Our framework generates synthetic data that consistently outperforms prior synthesis methods under semantic-quality evaluation, leading to stronger downstream fine-tuning performance, especially on semantically demanding benchmarks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
