PURPLE: Making a Large Language Model a Better SQL Writer

Tonghui Ren; Yuankai Fan; Zhenying He; Ren Huang; Jiaqi Dai; Can; Huang; Yinan Jing; Kai Zhang; Yifan Yang; X.Sean Wang

arXiv:2403.20014·cs.DB·April 1, 2024·2 cites

PURPLE: Making a Large Language Model a Better SQL Writer

Tonghui Ren, Yuankai Fan, Zhenying He, Ren Huang, Jiaqi Dai, Can, Huang, Yinan Jing, Kai Zhang, Yifan Yang, X.Sean Wang

PDF

Open Access

TL;DR

PURPLE enhances large language models for SQL translation by retrieving demonstrations with complex logical operators, significantly improving accuracy and robustness across benchmarks.

Contribution

The paper introduces PURPLE, a retrieval-based method that guides LLMs with demonstrations to better handle logical operator composition in NL2SQL tasks.

Findings

01

Achieves 80.5% exact-set match accuracy on Spider benchmark.

02

Maintains high accuracy across diverse datasets and models.

03

Demonstrates robustness and cost-effectiveness in NL2SQL translation.

Abstract

Large Language Model (LLM) techniques play an increasingly important role in Natural Language to SQL (NL2SQL) translation. LLMs trained by extensive corpora have strong natural language understanding and basic SQL generation abilities without additional tuning specific to NL2SQL tasks. Existing LLMs-based NL2SQL approaches try to improve the translation by enhancing the LLMs with an emphasis on user intention understanding. However, LLMs sometimes fail to generate appropriate SQL due to their lack of knowledge in organizing complex logical operator composition. A promising method is to input the LLMs with demonstrations, which include known NL2SQL translations from various databases. LLMs can learn to organize operator compositions from the input demonstrations for the given task. In this paper, we propose PURPLE (Pre-trained models Utilized to Retrieve Prompts for Logical Enhancement),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Computational Physics and Python Applications

MethodsSparse Evolutionary Training