Grounding Natural Language to SQL Translation with Data-Based Self-Explanations
Yuankai Fan, Tonghui Ren, Can Huang, Zhenying He, X. Sean Wang

TL;DR
CycleSQL is an iterative framework that enhances NL2SQL translation accuracy by self-evaluating and refining outputs using data-grounded natural language explanations, improving existing models on benchmark datasets.
Contribution
Introduces CycleSQL, a novel self-explanatory, iterative approach that improves NL2SQL translation accuracy by leveraging data-grounded natural language explanations for self-evaluation.
Findings
CycleSQL consistently improves existing models' performance.
Applying CycleSQL to RESDSQL achieves 82.0% accuracy on Spider.
NL explanations aid user understanding and interpretability.
Abstract
Natural Language Interfaces for Databases empower non-technical users to interact with data using natural language (NL). Advanced approaches, utilizing either neural sequence-to-sequence or more recent sophisticated large-scale language models, typically implement NL to SQL (NL2SQL) translation in an end-to-end fashion. However, like humans, these end-to-end translation models may not always generate the best SQL output on their first try. In this paper, we propose CycleSQL, an iterative framework designed for end-to-end translation models to autonomously generate the best output through self-evaluation. The main idea of CycleSQL is to introduce data-grounded NL explanations of query results as self-provided feedback, and use the feedback to validate the correctness of the translation iteratively, hence improving the overall translation accuracy. Extensive experiments, including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Natural Language Processing Techniques
MethodsSparse Evolutionary Training
