PET-SQL: A Prompt-Enhanced Two-Round Refinement of Text-to-SQL with Cross-consistency
Zhishuai Li, Xiang Wang, Jingjing Zhao, Sun Yang, Guoqing Du, Xiaoru, Hu, Bin Zhang, Yuxiao Ye, Ziyue Li, Rui Zhao, Hangyu Mao

TL;DR
This paper introduces PET-SQL, a two-round refinement framework for Text-to-SQL tasks that leverages cross-consistency among LLMs and enhanced prompts to improve accuracy on complex database queries.
Contribution
It proposes a novel prompt representation and a two-stage refinement process with cross-consistency, significantly advancing Text-to-SQL performance.
Findings
Achieves state-of-the-art 87.6% execution accuracy on Spider benchmark.
Introduces reference-enhanced prompts with schema and sampled cell values.
Utilizes cross-consistency across LLMs for post-refinement.
Abstract
Recent advancements in Text-to-SQL (Text2SQL) emphasize stimulating the large language models (LLM) on in-context learning, achieving significant results. Nevertheless, they face challenges when dealing with verbose database information and complex user intentions. This paper presents a two-stage framework to enhance the performance of current LLM-based natural language to SQL systems. We first introduce a novel prompt representation, called reference-enhanced representation, which includes schema information and randomly sampled cell values from tables to instruct LLMs in generating SQL queries. Then, in the first stage, question-SQL pairs are retrieved as few-shot demonstrations, prompting the LLM to generate a preliminary SQL (PreSQL). After that, the mentioned entities in PreSQL are parsed to conduct schema linking, which can significantly compact the useful information. In the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Scientific Computing and Data Management · Advanced Database Systems and Queries
