SQL-to-Schema Enhances Schema Linking in Text-to-SQL
Sun Yang, Qiong Su, Zhishuai Li, Ziyue Li, Hangyu Mao, Chenxi Liu, Rui, Zhao

TL;DR
This paper introduces a two-step schema linking method for Text-to-SQL that improves accuracy by generating an initial SQL query to refine schema selection, leading to better performance on the Spider dataset.
Contribution
The paper proposes a novel schema linking approach that refines schema selection through initial SQL generation, reducing errors and improving performance in Text-to-SQL tasks.
Findings
Our method achieves comparable results to mainstream methods on Spider dataset.
Using CodeLlama-34B, our schema performs better than mainstream methods.
GPT-4 based SQL generation shows promising results with the refined schema.
Abstract
In sophisticated existing Text-to-SQL methods exhibit errors in various proportions, including schema-linking errors (incorrect columns, tables, or extra columns), join errors, nested errors, and group-by errors. Consequently, there is a critical need to filter out unnecessary tables and columns, directing the language models attention to relevant tables and columns with schema-linking, to reduce errors during SQL generation. Previous approaches have involved sorting tables and columns based on their relevance to the question, selecting the top-ranked ones for sorting, or directly identifying the necessary tables and columns for SQL generation. However, these methods face challenges such as lengthy model training times, high consumption of expensive GPT-4 tokens in few-shot prompts, or suboptimal performance in schema linking. Therefore, we propose an inventive schema linking method in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries
MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Absolute Position Encodings · Byte Pair Encoding · Adam · Dropout · Softmax
