RingSQL: Generating Synthetic Data with Schema-Independent Templates for Text-to-SQL Reasoning Models
Marko Sterbentz, Kevin Cushing, Cameron Barrie, Kristian J. Hammond

TL;DR
RingSQL is a hybrid framework that generates high-quality, schema-independent synthetic data for text-to-SQL models, improving their accuracy across multiple benchmarks by combining template-based SQL with LLM paraphrasing.
Contribution
It introduces a novel hybrid data generation method that maintains SQL correctness while enhancing linguistic diversity, addressing limitations of existing synthetic data approaches.
Findings
Models trained with RingSQL data improve accuracy by +2.3% on average.
RingSQL effectively balances correctness and diversity in synthetic data.
The approach scales well across different schemas and datasets.
Abstract
Recent advances in text-to-SQL systems have been driven by larger models and improved datasets, yet progress is still limited by the scarcity of high-quality training data. Manual data creation is expensive, and existing synthetic methods trade off reliability and scalability. Template-based approaches ensure correct SQL but require schema-specific templates, while LLM-based generation scales easily but lacks quality and correctness guarantees. We introduce RingSQL, a hybrid data generation framework that combines schema-independent query templates with LLM-based paraphrasing of natural language questions. This approach preserves SQL correctness across diverse schemas while providing broad linguistic variety. In our experiments, we find that models trained using data produced by RingSQL achieve an average gain in accuracy of +2.3% across six text-to-SQL benchmarks when compared to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Web Application Security Vulnerabilities
